CIT Celebrates 25 Years of NIH’s Biowulf Supercomputer
The name of NIH’s supercomputer—Biowulf—was inspired by the namesake hero of the epic poem, “Beowulf,” one of the most important works of old English literature. Beowulf becomes known for his great deeds, like slaying monsters. Biowulf, the supercomputer, isn’t quite as old as the poem, but the high-performance computing resource is celebrating its 25th anniversary this year and it has an impressive history of slaying research questions and enabling scientific research.
In honor of Biowulf’s anniversary, the Center for Information Technology’s (CIT) High-Performance Computing (HPC) team, who manages Biowulf and provides consulting and scientific support services, hosted a series of seminars throughout the year. The series featured scientists discussing how they use Biowulf in their labs to enable and enhance their research.
The Birth of Biowulf
It started in the 1990s when the size of datasets expanded in fields like genomics, biochemistry and microbiology. Biowulf was created and came online in 1999 as a response to the need to analyze such large amounts of data.
That early version of Biowulf seems humble now, a cluster of 40 “boxes on shelves” with 80 compute cores and two file servers. The fledgling supercomputer had just two applications, 14 users, two dedicated staff members and one citation in a scientific paper.
System Upgrades
Over the years, Biowulf has undergone several expansions. Today, the system has more than 100,000 processing cores, is used in more than 650 labs and has over 2,400 active users. It also has 40 large-memory nodes for memory-intensive projects and 1,050 graphics processing unit processors to handle imaging applications.
In addition, Biowulf’s data storage capacity has increased by an astounding 1,000% to 60 petabytes. Between 1999 and 2023, more than 5,000 scientific papers were published citing Biowulf usage. In fact, in 2024, 10.8% of all published papers at NIH acknowledged the use of Biowulf.
A Research Powerhouse
Biowulf has proven itself to be a powerful research tool. It has been ranked among the most powerful supercomputers in the world by the TOP500 project and is the world’s most powerful supercomputer dedicated to advancing biomedical research.
During the pandemic, the HPC team prioritized Covid-19-related research on Biowulf. These projects used over 87 million CPU hours with over 2 million jobs run, and the system was cited in more than 50 published, peer-reviewed papers.
In March 2022, the Telomere-to-Telomere (T2T) Consortium—an open, global team of scientists led by National Human Genome Research Institute (NHGRI) researchers— reported they had published the first complete human genome sequence with no gaps. It was a landmark achievement, and Biowulf played a critical role by enabling geneticists to sequence and study areas of the human chromosome that contain highly repetitive DNA that had long been a mystery.
“Biowulf played two critical roles in that project,” noted Dr. Adam Phillippy, head of Genome Informatics Section at NHGRI, during a roundtable discussion at CIT’s October Town Hall. “First, it was just the compute power we had at our fingertips…It was not uncommon for my group to run 30 million CPU hours a year on Biowulf when we first joined in 2015, and the ability to run that huge amount of CPU was really critical for us in improving the efficiency and the accuracy of our methods.”
Biowulf also played a critical role in the Globus share functionality. Globus is a service on Biowulf that simplifies moving, syncing, and sharing large amounts of data.
Phillippy described how this was important to the T2T project. “We opened our Biowulf partitions as a global share for T2T consortium…and it served as the central data hub for that project. We ended up publishing 8 or 10 companion papers in 2022, when the genome was finished, and that was all enabled by this collaborative analysis and collaborative science with Biowulf and Globus serving as the central data hub for that whole project. It was really transformational in the way we were able to do our work and played a critical role in enabling the success of the T2T project and finishing the human genome.”
In 2019, an independent assessment of NIH high-performance computing found that “Biowulf stands out as one of the most scientifically impactful and successful cross-IC efforts at the NIH.” The system directly supports almost 70% of research projects at the principal investigator, laboratory and institute/center level.
Recruiter Tool
Statistics show Biowulf attracts talent. Early-career recruits beginning tenure-track work at NIH are drawn to the resource and its ability to analyze vast datasets.
Dr. Andy Baxevanis, director of computational biology in the NIH Office of Intramural Research, indicated that Biowulf is seen as a competitive differentiator when trying to attract highly sought after investigators with significant high-performance computing needs. The availability of Biowulf has attracted a number of Stadtman Investigator recruitments, the Intramural Research Program’s premier faculty recruitment effort intended to bring preeminent researchers to NIH.
“We’re extremely encouraged by the number of researchers who want to use Biowulf,” said Baxevanis. “High-performance computing is a critical element of modern-day biomedical research, and Biowulf is uniquely positioned to help [NIH investigators] tackle crucial research questions that were previously beyond our reach.”
Expert Staff
One of Biowulf’s unique assets is the staff that manages the system. The HPC team not only supports the system’s hardware and software, but they also help researchers get the most out of the system by providing classes, seminars and walk-in consultations.
The team has the technical and scientific knowledge to handle various concerns, from scripting problems to node allocation to strategies for a particular project. They are dedicated staff and the secret sauce that makes Biowulf a world-class tool for biomedical research.
Looking Ahead
Biowulf is poised to continue its role empowering research for the NIH intramural research community and developing new offerings, such as support for personally identifiable information and personal health information on the HPC systems, which several HPC customers have requested.
For details about Biowulf, including how to get an account, see: https://hpc.nih.gov/docs/accounts.html.