Microbes mediate the global marine cycles of elements, modulating atmospheric carbon dioxide and helping to maintain the oxygen we all breathe, yet there is much about them scientists still don’t understand. Now, an award from the Simons Foundation will give researchers from MIT's Darwin Project access to bigger, better computing resources to model these communities and probe how they work.
The simulations of plankton populations made by Darwin Project researchers have become increasingly computationally demanding. MIT Professor Michael "Mick" Follows and Principal Research Engineer Christopher Hill, both affiliates of the Darwin Project, were therefore delighted to learn of their recent Simons Foundation award, providing them with enhanced compute infrastructure to help execute the simulations of ocean circulation, biogeochemical cycles, and microbial population dynamics that are the bread and butter of their research.
The Darwin Project, an alliance between oceanographers and microbiologists in the MIT Department of Earth, Atmospheric and Planetary Sciences (EAPS) and the Parsons Lab in the MIT Department of Civil and Environmental Engineering, was conceived as an initiative to “advance the development and application of novel models of marine microbes and microbial communities, identifying the relationships of individuals and communities to their environment, connecting cellular-scale processes to global microbial community structure" with the goal of coupling “state of the art physical models of global ocean circulation with biogeochemistry and genome-informed models of microbial processes."
In response to increases in model complexity and resolution over the course of past decade since the project’s inception in 2007, computational demands have ballooned. Increased fidelity and algorithmic sophistication in both biological and fluid dynamical component models and forays into new statistical analysis approaches, leveraging big-data innovations to analyze the simulations and field data, have grown inexorably.
"The award allows us to grow our in-house computational and data infrastructure to accelerate and facilitate these new modeling capabilities," says Hill, who specializes in Earth and planetary computational science.
The boost in computational infrastructure the award provides for will advance several linked areas of research, including the capacity to model marine microbial systems in more detail, enhanced fidelity of the modeled fluid dynamical environment, support for state of the art data analytics including machine learning techniques, and accelerating and extending genomic data processing capabilities.
High diversity is a ubiquitous aspect of marine microbial communities that is not fully understood and, to date, is rarely resolved in simulations. Darwin Project researchers have broken new ground and continue to push the envelope in modeling in this area: In addition to resolving a much larger number of phenotypes and interactions than has typically been attempted by other investigators, the Darwin Project team has also been increasing the fidelity of the underlying physiological sub-models which define traits and trade-offs.
"One thing we are doing is implementing simplified metabolic models which resolve additional constraints [electron and energy conservation] and higher fidelity [dynamic representations of macro-molecular and elemental composition]," says Fellows. "These advances require more state variables per phenotype. We have also an explicit radiative transfer model that allows us to better exploit satellite remote sensing data but both come at a greater computational expense.” Darwin researchers are also expanding their models to resolve not only phototrophic and grazer communities in the surface ocean, but to include heterotrophic and chemo-autotrophic populations throughout the water column.
Follows and Hill believe these advances will provide better fidelity to real world observations, a more dynamic and fundamental description of marine microbial communities and biogeochemical cycles, and the potential to examine the underlying drivers and significance of diversity in the system.
"Much of the biological action in the surface ocean occurs at scales currently unresolved in most biogeochemical simulations,” Follows explains. “Numerical models and recent observations show that the sub-mesoscale motions in the ocean have a profound impact on the supply of resources to the surface and the dispersal and communication between different populations. The integral impact of this, and how to properly parameterize it, is not yet clear, but one approach, that is within reach, is to resolve these scales of motion nested within global simulations,"
Hill and Follows hope such advances will allow them to examine both local and regionally integrated effects of fine-scale physical drivers. "We have already completed a full annual cycle numerical simulation that resolves physical processes down to kilometer scales globally," says Hill. “Such simulations provide a basis for driving targeted modeling of, for example, the role of fronts that may involve fully non-hydrostatic dynamics and that could help explain in-situ measurements that suggest enhanced growth rates under such conditions.” Such work is strongly complementary to another Simons Foundation sponsored project, the Simons Collaboration on Ocean Processes and Ecology (SCOPE). As an initiative to advance our understanding of the biology, ecology, and biogeochemistry of microbial processes that dominate Earth’s largest biome — the global ocean — SCOPE seeks to measure, model, and conduct experiments at a model ecosystem site located 100 km north of the Hawaiian island of Oahu that is representative of a large portion of the North Pacific Ocean.
The team has also already implemented algorithms to enable explicit modeling of the relevant fluid dynamics, but here too, the approaches are computationally demanding. "The improved facilities this award provides will enable these extremely demanding experiments to proceed," says Follows.
Enhanced computer resources will also allow Darwin Project researchers to more effectively utilize data analytics. "We are adopting multiple statistical approaches for classifying fluid dynamical and ecosystem features in observations and in simulations which we plan to apply to biogeochemical problems," says Hill. “One current direction, which employs random forest classification to identify features corresponding to training sets, is showing particular promise for objectively quantifying links between biogeochemical event occurrence and physical environment phenomena.”
Not only will these methods provide useful analysis tools for their simulations, the pair also see them bridging to real world interpretations of, for example, metagenomics surveys in the ocean. Follows and Hill see this direction as a route by which to bring simulations and observations closer in new and meaningful ways. The growth in computational infrastructure the Simons award allows for, creates the potential for making much larger queries across more realistic datasets.
The Darwin Project is part of a long and fruitful collaboration with Institute Professor Sally "Penny" Chisholm of MIT’s Department of Civil and Environmental Engineering. Steady growth in available large-scale metagenomic and single-cell genomic data resulting from genetics data activities in the Chisholm Lab are also driving additional computational processing resource needs.
With the new Simons-supported enhancements in computational infrastructure, Darwin Project collaborators in the Chisholm Lab will be able to tackle assembly from larger metagenomic libraries and single-cell genome phylogenies using maximum likelihood and/or Bayesian algorithms. Currently, some large metagenomics assembly activities require compute resources with more memory than this team has readily had available. "Single-cell genome phylogeny activities are computationally demanding and require dedicating compute resources for weeks or months at a time, Hill explains. “This creates a bottleneck for other work. To accelerate work in these areas additional compute resources, some with larger memory than current resources and some with GPU accelerators are going to be hugely beneficial. The new systems will permit larger metagenomics library assembly than is currently possible."