skip navigation nih record
Vol. LXIV, No. 6
March 16, 2012

previous story

next story

More Algorithms Needed
MIT’s Berger Outlines Scope of ‘Big Data’ Problem

This year’s Margaret Pittman Lecture, given on Feb. 1 by MIT computational biologist Dr. Bonnie Berger, featured a classic good news/bad news scenario: while massive amounts of new sequencing data are being generated worldwide, computing power is not advancing rapidly enough to digest it.

MIT’s Dr. Bonnie Berger gives Pittman Lecture.

MIT’s Dr. Bonnie Berger gives Pittman Lecture.

“We are currently generating massive data sets and are badly in need of new algorithms” to sort the data meaningfully, said Berger, whose topic was “Computational Biology in the 21st Century: Making Sense Out of Massive Data.”

“Sequencing data is growing astronomically,” she said. “The good news is that there are lots of data in which to find patterns…The bad news is that we face a computationally intractable problem due to the enormous amount of data. There has been an exponential explosion in the amount of sequencing data.”

During the 1990s, she explained, computing power adhered to Moore’s Law [computing speed doubles roughly every 2 years], which was sufficient to keep up with the accumulation of sequencing data. But in the past decade or so, sequencing has outpaced computing speed.

“It’s tempting to think that cloud computing will solve the problem,” Berger said, “but that’s not the case. Computing power per dollar has not kept up with sequencing speed. We need fundamentally better algorithms and we need them quickly.”

Berger said the hunt for new algorithms to tame the data beast have yielded unexpected biological insights; it’s almost as if the view from 50,000 feet has brought new patterns and relationships into focus.

Dr. Lemuel Russell IV speaks at MLK ceremony.

Berger said the hunt for new algorithms to tame the data beast have yielded unexpected biological insights.

Photos: Bill Branson

She outlined three strategies she and her colleagues have adopted in the face of a data immensity problem that NIH director Dr. Francis Collins, who introduced Berger, said has even gained White House attention. “Dr. Berger’s work is very timely for us at NIH,” he said, “since we too are struggling with ‘Big Data.’ Even the White House is asking how we can manage such large quantities of biological data.”

  • Large-scale genomics: Just as digital music files can be compressed in order to store and share them more easily, genomic information can be compressed. But it has to be “compression we can use,” said Berger, who added that “compression doesn’t solve all of our problems…eventually we have to look at [all of] it…Much data is similar, so how do we take advantage of the redundancy? By compressive genomics.” Using a strategy she called “approximate succinct data structures,” researchers only focus on non-redundant data, not the full set.
  • Medical genomics: By applying sophisticated algorithms that tease out signal from noise, Berger and colleagues can map what they call a “transcriptomic landscape” from a large compendium of disparate gene expression studies. Clinical applications include the ability to better identify the tissue of origin of metastatic cancer, classify tumors of unknown origin, find marker genes specific to diseases and stratify tumor grade.
  • Network biology: The way biological molecules “talk” to one another tends to be conserved across species. By modeling protein-protein interactions in many organisms, researchers can search for conserved network structure. The IsoRank algorithm developed in Berger’s lab looks for structural similarities. Her IsoBase database illustrates functional relationships. Berger, a professor of applied mathematics and computer science, thinks that “the high level spectral techniques that inform IsoRank and IsoBase will allow biologists to bring their systems-level knowledge of model organisms to next inform our understanding of widely diverse species across the kingdom of life.”

The annual Pittman Lecture, established in 1994, honors an outstanding scientist who is thought to be the first woman to head an NIH laboratory.—Rich McManus NIHRecord Icon

back to top of page