In two companion papers, researchers led by the Whitehead Institute Center for Genome Research reported important findings that set the stage for the next steps in the Human Genome Project--mapping and identifying all the genes that predispose us to common diseases.
The studies--one by Whitehead Fellow Mark Daly, Professor of Biology Eric Lander and colleagues, and the other by Whitehead research scientist John Rioux and colleagues--provide the impetus for building a "haplotype" map of the genome: a map that will make it easier, faster and perhaps cheaper to find disease-causing or disease-predisposing genes. The findings were published in accompanying papers in the October issue of Nature Genetics.
Haplotypes are ancestral segments of chromosomes containing many single-letter genetic variations inherited together as a set or a block that can be used to decipher the genetic differences that make some people more susceptible to disease than others. Identifying haplotypes became an important concept when scientists began to realize that single nucleotide polymorphisms (SNPs)--the single-letter DNA differences between individuals that comprise most genetic variation and thus underlie disease susceptibility--travel together in large blocks.
If this is the case for the entire genome, a haplotype map would make finding disease genes a manageable task. Instead of searching through a giant haystack of millions of SNPs, scientists would be searching through bundles of 10,000 to 50,000 bases each.
The Whitehead studies provide a strong case for building a haplotype map. One study suggests that large segments of the genome may be very modular, with genetic variations traveling together as large blocks that come in very few varieties. The other study identifies a common haplotype with a gene for susceptibility to Crohn's disease, a chronic inflammatory bowel disease (IBD) that affects more than one million Americans. This study functioned both as the clue and an example for how haplotype maps can be useful to identify genes for common disease.
Crohn's disease is a so-called "complex" disorder with a tendency to cluster in families, suggesting that several genes play an important role but that environment is also a key component. Scientists had previously identified a gene on chromosome 16 as a culprit, but this gene could only account for a fraction of the IBD cases.
In this study, Whitehead researchers identified a neighborhood on chromosome 5 wherein lies another gene, IBD5, involved in the disease. The gene lies in a region surrounded by a cluster of interleukins--genes that are involved in immune function and regulation.
"We were very excited to identify this region--it made perfect sense given the inflammatory nature of these diseases," said Rioux, first author on the Crohn's disease study. "This region may also be important in other inflammatory diseases besides IBD, such as lupus and asthma."
Researchers believe that in Crohn's patients, faulty responses to microbes that live in the digestive system may somehow trigger the immune system to attack the lining of the digestive tract, causing it to decay and become inflamed. "Finding a gene in a region known to be important in immunity may help us understand the disease mechanism and design better therapies," said Rioux.
Rioux and his colleagues identified all the SNPs in a large region of chromosome 5 implicated by their previous research. When they looked at these SNPs in individuals affected by Crohn's disease and those who were not, they found an entire block of variation, or haplotype, that correlated with disease. Many SNPs which uniquely mark that haplotype or a combination of such SNPs are candidates for causing disease.
Of these unique SNPs, none caused changes in amino acid sequence in the proteins encoded by the known genes. This could mean the disease causing SNP is in a regulatory region of a known gene and controls levels of expression of the gene, or there may be an yet unidentified gene in the region that is mutated. The researchers will now turn to molecular biology to identify the culprit.
The tools and approach used to localize the IBD gene will be broadly applicable to many complex diseases such as asthma, diabetes, heart disease and psychiatric illness. "The simple patterns in human variation we describe in this paper exist in the general population and aren't specific to any disease or any particular ethnic background," said Daly, first author on the paper on the haplotype structure.
It was while using SNPs to dissect the region of chromosome 5 with the IBD5 gene that the researchers noticed that SNPs travel together in large blocks. This suggested that researchers won't have to search through every single SNP in an area of the genome to find one responsible for disease. Instead, researchers could simply look at a handful of key SNPs and know the identity of tens or hundreds of other neighboring SNPs. "This is the first time that we see a way to study the whole genome comprehensively," he said.
Daly and his colleagues also found that these haplotypes (a given set of SNPs) exist regionally in only two to four distinct sequence patterns. Basically, if a researcher is looking at a particular block of the genome, there will frequently be fewer than five flavors of variation of the sequence in that region across entire populations. They also observed that the blocks are separated by regions where considerable shuffling has occurred over generations, so each individual may have a unique combination of these blocks. Such shuffling or recombination occurs naturally in cells when DNA sequences on maternal and paternal chromosomes are exchanged during the formation of egg or sperm.
"Understanding human variation at this level will have a big impact on medical genetics in the future. The length and complexity of these blocks is going to vary in different parts of the genome. We now need to characterize the whole genome--create haplotype maps--so this type of work can be done easily for any disease, anywhere in the genome," Daly said.
If the architecture of the blocks and the existing haplotypes are mapped, for instance, then a researcher studying a particular disease will be able to pick a few SNPs from every block in the genome and study this set in his patients. Sequencing large populations of patients for all available SNPs is still a costly and time-consuming process. This type of comprehensive analysis will help scientists more rapidly identify key SNPs that correlate to the disease of interest.
A version of this article appeared in MIT Tech Talk on October 24, 2001.