Inside every human cell, 2 meters of DNA is crammed into a nucleus that is only one-hundredth of a millimeter in diameter.
To fit inside that tiny space, the genome must fold into a complex structure known as chromatin, made up of DNA and proteins. The structure of that chromatin, in turn, helps to determine which of the genes will be expressed in a given cell. Neurons, skin cells, and immune cells each express different genes depending on which of their genes are accessible to be transcribed.
Deciphering those structures experimentally is a time-consuming process, making it difficult to compare the 3D genome structures found in different cell types. MIT Professor Bin Zhang is taking a computational approach to this challenge, using computer simulations and generative artificial intelligence to determine these structures.
“Regulation of gene expression relies on the 3D genome structure, so the hope is that if we can fully understand those structures, then we could understand where this cellular diversity comes from,” says Zhang, an associate professor of chemistry.
From the farm to the lab
Zhang first became interested in chemistry when his brother, who was four years older, bought some lab equipment and started performing experiments at home.
“He would bring test tubes and some reagents home and do the experiment there. I didn’t really know what he was doing back then, but I was really fascinated with all the bright colors and the smoke and the odors that could come from the reactions. That really captivated my attention,” Zhang says.
His brother later became the first person from Zhang’s rural village to go to college. That was the first time Zhang had an inkling that it might be possible to pursue a future other than following in the footsteps of his parents, who were farmers in China’s Anhui province.
“Growing up, I would have never imagined doing science or working as a faculty member in America,” Zhang says. “When my brother went to college, that really opened up my perspective, and I realized I didn’t have to follow my parents’ path and become a farmer. That led me to think that I could go to college and study more chemistry.”
Zhang attended the University of Science and Technology in Hefei, China, where he majored in chemical physics. He enjoyed his studies and discovered computational chemistry and computational research, which became his new fascination.
“Computational chemistry combines chemistry with other subjects I love — math and physics — and brings a sense of rigor and reasoning to the otherwise more empirical rules,” he says. “I could use programming to solve interesting chemistry problems and test my own ideas very quickly.”
After graduating from college, he decided to continue his studies in the United States, which he recalled thinking was “the pinnacle of academics.” At Caltech, he worked with Thomas Miller, a professor of chemistry who used computational methods to understand molecular processes such as protein folding.
For Zhang’s PhD research, he studied a transmembrane protein that acts as a channel to allow other proteins to pass through the cell membrane. This protein, called translocon, can also open a side gate within the membrane, so that proteins that are meant to be embedded in the membrane can exit directly into the membrane.
“It’s really a remarkable protein, but it wasn’t clear how it worked,” Zhang says. “I built a computational model to understand the molecular mechanisms that dictate what are the molecular features that allow certain proteins to go into the membrane, while other proteins get secreted.”
Turning to the genome
After finishing grad school, Zhang’s research focus shifted from proteins to the genome. At Rice University, he did a postdoc with Peter Wolynes, a professor of chemistry who had made many key discoveries in the dynamics of protein folding. Around the time that Zhang joined the lab, Wolynes turned his attention to the structure of the genome, and Zhang decided to do the same.
Unlike proteins, which tend to have highly structured regions that can be studied using X-ray crystallography or cryo-EM, DNA is a very globular molecule that doesn’t lend itself to those types of analysis.
A few years earlier, in 2009, researchers at the Broad Institute, the University of Massachusetts Medical School, MIT, and Harvard University had developed a technique for studying the genome’s structure by cross-linking DNA in a cell’s nucleus. Researchers can then determine which segments are located near each other by shredding the DNA into many tiny pieces and sequencing it.
Zhang and Wolynes used data generated by this technique, known as Hi-C, to explore the question of whether DNA forms knots when it’s condensed in the nucleus, similar to how a strand of Christmas lights may become tangled when crammed into a box for storage.
“If DNA was just like a regular polymer, you would expect that it will become tangled and form knots. But that could be very detrimental for biology, because the genome is not just sitting there passively. It has to go through cell division, and also all this molecular machinery has to interact with the genome and transcribe it into RNA, and having knots will create a lot of unnecessary barriers,” Zhang says.
They found that, unlike Christmas lights, DNA does not form any knots even when packed into the cell nucleus, and they built a computational model allowing them to test hypotheses for how the genome is able to avoid those entanglements.
Since joining the MIT faculty in 2016, Zhang has continued developing models of how the genome behaves in 3D space, using molecular dynamic simulations. In one area of research, his lab is studying how differences between the genome structures of neurons and other brain cells give rise to their unique functions, and they are also exploring how misfolding of the genome may lead to diseases such as Alzheimer’s.
When it comes to connecting genome structure and function, Zhang believes that generative AI methods will also be essential. In a recent study, he and his students reported a new computational model, ChromoGen, that uses generative AI to predict the 3D structures of genomic regions, based on their DNA sequences.
“I think that in the future, we will have both components: generative AI and also theoretical chemistry-based approaches,” he says. “They nicely complement each other and allow us to both build accurate 3D structures and understand how those structures arise from the underlying physical forces.”