Algorithm Sifts Through ‘Sea of Mutations’

By Eric Bock

Dr. Mona Singh

Photo: Chia-Chi Charlie Chang

An algorithm can search through a “sea of mutations” in cancer genomes to find those that play a role driving cancer initiation and progression, said Dr. Mona Singh at an NIH Director’s Lecture held recently in Lipsett Amphitheater.

“There are lots of mutations per cancer genome, and yet only a few of these mutations within an individual are relevant for his or her cancer,” said Singh, professor of computer science at Princeton University’s Lewis Sigler Institute for Integrative Genomics. “The mutations that you see across individuals can vary quite a bit.”

Cancer is a disease where cells acquire genetic mutations that allow them to divide uncontrollably. These mutations can affect proteins, each of which play a critical role in the body.

Thanks to The Cancer Genome Atlas and other cancer genomics initiatives, researchers can access large datasets featuring data from thousands of tumor samples and several cancer types. Many researchers identify cancer genes by using programs that search for genes that mutate at a higher frequency than others do.

While these frequency-based methods are powerful, Singh believes they are insufficient because there are many cancer-relevant genes that are mutated at lower frequencies across tumors. “It makes sense to not just look at cancer data by itself as these frequency-based methods do, but instead, look at it within the context of other types of data that have been collected over the years about proteins and genomes,” she argued.

Her group has built an algorithm to search through mutational patterns that involve how proteins interact with DNA, RNA and other molecules. The program incorporates cross-genomic and population information. The algorithm tries to identify whether mutations disrupt protein interactions.

In one test, her team used the algorithm to search through 11,000 tumor samples across 33 types of cancer. They uncovered several known cancer genes. Additionally, they identified several genes thought to be cancer genes as they have many mutations within sites where proteins interact with each other or with other biomolecules.

Singh and her group have developed another resource called the InteracDome, “which may have applicability in many other disease types because it allows you to see where mutations are hitting interaction sites.”

Proteins work together within large networks. Some proteins, for instance, respond to DNA damage, while others play a role in cell growth. Mutations in any of the proteins associated with a particular function can alter the function. Further, proteins that are associated with the same function tend to be near each other in the network.

To find cancer-relevant mutations by leveraging these networks, Singh helped build a framework that considers somatic mutation data across individual tumors within the context of protein interaction networks. The idea is to find small subnetworks where many patients have mutations in at least one component protein—even if none of the proteins individually are frequently mutated across all the individuals’ tumors. She also developed a network approach that additionally leverages known cancer genes.

Singh and her team search for mutations relevant to patients’ cancer.

Photo: Chia-Chi Charlie Chang

To test her method, Singh’s team searched a dataset featuring 278 individuals with a type of brain cancer called glioblastoma. Her approach showed “there’s clear benefit to interpreting new potential disease genes in the context of previously known disease genes.”

New research shouldn’t be considered in isolation, she argued. “We should use this prior knowledge about proteins that are known to be cancer-relevant to somehow guide our network-based processes.”

April 3, 2020

Vol. LXXII, No. 7

April 3, 2020

Vol. LXXII, No. 7

Algorithm Sifts Through ‘Sea of Mutations’

The NIH Record