||NHGRI director Dr. Eric Green
It is somehow heartening to hear the numbing
array of G’s, C’s, T’s and A’s that constitute our genomes described as “ridiculously large amounts of information,” and to hear the latest
generation of DNA sequencing technologies dubbed “fancy, shmancy.” Anything that makes the topic seem less daunting is appreciated.
While non-genome geeks may believe that the Human Genome Project ended in October 2004 when the finished sequence was published
[April 2003 was the actual project finish date], Green asserts that the project itself “is not the end of genomics, but the beginning. There are many new frontiers to be built on the Human Genome Project foundation. Now the challenge is the application of genomics to human health. How do we realize the promise of genomic medicine?”
Green noted that we don’t yet have the entire genome, either. “The centromeres and other parts of chromosomes still can’t be fully recovered,”
he said. “It is hard and expensive to completely
finish a genome sequence, but it’s only a very small percentage that is missing. Even now, we continue to fill in parts and learn a lot about the human genome.”
Green said several major steps lie ahead for genomics. “The Human Genome Project covered mapping and sequencing,” he said. “Completely interpreting the human genome sequence may take decades. It is a huge undertaking.”
Key questions include: What parts of the genome are functional, and which are not?
“About 5 percent of the human genome is evolutionarily
constrained across mammalian species
and presumed to be functional,” said Green. “The problem is, we don’t know where [the roughly 150 million functional base pairs] are.”
|NHGRI’s Green says, “Genomics has not gotten dull since the end of the Human Genome Project. It has been a spectacular 7 years since 2003. The genomic revolution continues.”
Within that 5 percent, about 1.5 percent of the genome encodes for genes, which are thought to number around 20,000 and to be involved in production of far more than that number of proteins. Green said we have a good inventory
of those gene sequences at present. But that leaves roughly 3.5 percent of the genome as non-coding functional sequence. These regions include gene regulatory elements, chromosomal functional elements and undiscovered functional
elements not yet described in any textbooks.
“For example, there’s a whole RNA world out there that’s functionally important but not by coding for protein,” he said. “We have a poor inventory of these elements, but it is a major priority in genomics now to develop one.”
That knowledge gap is being addressed by another major effort—comparative sequence analyses. Green said that evolution can serve as a “consultant” to genomics by pointing to areas of the human genome that are highly conserved with other mammals.
“We know that highly conserved regions of the genome are most often functionally important, but this is not always the case,” he cautioned. A major effort to “skim read” the genomes of a large set of mammals as a means to find the most-conserved parts of the human genome has recently been completed.
Another challenge is to find out what the important stretches of sequence actually do, and a project called ENCODE (Encyclopedia of DNA Elements) is doing just that across the human genome, said Green. “Richer and more detailed views of the human genome are now emerging,” he said.
Green said intra-species sequence comparisons are also essential. “Which differences among us are relevant?” he asked. All humans are roughly 99.7 percent identical at the level of their DNA sequence, which leaves 3 million to 5 million base pairs differing among individuals. “Most [variants] are innocent,” Green noted, “and have no phenotypic consequence, but some are metaphorical bombs.”
To find variants that contribute to disease, the HapMap project and numerous efforts using genome-wide association studies (GWAS) were pursued. “The number of variants that have been found to confer risk has been exploding yearly,” Green reported. “It’s been a remarkably successful effort.”
Green said that GWAS results increasingly show that “regions conferring risk very often [perhaps
70-90 percent of the time] reflect non-coding
parts of the genome…the non-coding functional
landscape is of great interest in many labs now.”
The final point of Green’s talk (which is archived at www.videocast.nih.gov and would make an excellent introduction to the budding genomicist
in your family, along with the course notes and syllabus available at www.genome.gov/COURSE2010/) is that DNA sequencing
has become dramatically cheaper over the years. Whereas it cost upwards of $1 billion to sequence the human genome the first time around, the goal now is to achieve the same result for $1,000. Major progress has been made en route to that goal, Green noted.
As sequencing becomes cheaper and easier—yielding tsunamis of data—the new challenge becomes analyzing data, not generating it. “It’s like trying to get a drink of water out of a fire hose,” Green said. “It’s overwhelming, but it’s also exhilarating.”
He concluded, “Genomics has not gotten dull since the end of the Human Genome Project. It has been a spectacular 7 years since 2003. The genomic revolution continues.