Getting metabolism right

Analysis of 89 models of metabolic processes finds flaws in 44 of them — but suggests corrections.


Press Contact

Abby Abazorius
Email: abbya@mit.edu
Phone: 617-253-2709
MIT News Office

Media Resources

1 images for download

Access Media

Media can only be downloaded from the desktop version of this website.

Metabolic networks are mathematical models of every possible sequence of chemical reactions available to an organ or organism, and they’re used to design microbes for manufacturing processes or to study disease. Based on both genetic analysis and empirical study, they can take years to assemble.

Unfortunately, a new analytic tool developed at MIT suggests that many of those models may be wrong. Fortunately, the same tool may make it fairly straightforward to repair them.

“They have all these models in this repository hosted at [the University of California at] San Diego,” says Bonnie Berger, a professor of applied mathematics and computer science at MIT and one of the tool’s developers, “and it turns out that many of them were computed with floating-point arithmetic” — an approximate numerical representation that most computer systems use to increase efficiency. “We were able to prove that you need to compute them in exact arithmetic,” Berger says. “When we computed them in exact arithmetic, we found that many of the models that were believed to be realistic don’t produce any growth under any circumstances.”

Berger and colleagues describe their new tool, and the analyses they performed with it, in the latest issue of Nature Communications. First author on the paper is Leonid Chindelevitch, who was a graduate student in Berger’s group when the work was done and is now a postdoc at the Harvard School of Public Health. He and Berger are joined by Aviv Regev, an associate professor of biology at MIT, and Jason Trigg, another of Berger’s former students.

Floating-point arithmetic is kind of like scientific notation for computers. It represents numbers as a decimal multiplied by a base — like 2 or 10 — raised to a particular power. Though it sacrifices some accuracy relative to exact arithmetic, it generally makes up for it with gains in computational efficiency.

Indeed, in order to perform an exact-arithmetic analysis of a data structure as huge and complex as a metabolic network, Berger and Chindelevitch had to find a way to simplify the problem — without sacrificing any precision.

Pruning the network

Metabolic networks, Chindelevitch says, “describe the set of all reactions that are available to a particular organism that we might be interested in. So if we’re interested in yeast or E. coli or the tuberculosis bacterium, this is a way to put together everything we know about what this organism can do to transform some substances into some other substances. Usually it will get nutrients from the environment, and then it will transform them by its own internal mechanisms to produce whatever it is that it wants to produce — ethanol, different cellular components for itself, and so on.”

The network thus represents every sequence of chemical reactions catalyzed by enzymes encoded in an organism’s DNA that could lead from particular nutrients to particular chemical products. Every node of the network represents an intermediary stage in some chain of reactions.

To simplify such networks enough to enable exact arithmetical analysis, Chindelevitch and Berger developed an algorithm that first identifies all the sequences of reactions that, for one reason or another, can’t occur within the context of the model; it then deletes these. Next, it identifies clusters of reactions that always work in concert: Whatever their intermediate products may be, they effectively perform a single reaction. The algorithm then collapses those clusters into a single reaction.

Most crucially, Chindelevitch and Berger were able to mathematically prove that these modifications wouldn’t affect the outcome of the analysis.

“What the exact-arithmetic approach allows you to do is respect the key assumption of the model, which is that at steady state, every metabolite is neither produced in excess nor depleted in excess,” Chindelevitch says. “The production balances the consumption for every substance.”

When Chindelevitch and Berger applied their analysis to 89 metabolic-network models in the San Diego repository, they found that 44 of them contained errors or omissions: If the products of all the reactions in the networks were in equilibrium, the organisms modeled would be unable to grow.

Patching it up

By adapting algorithms used in the field of compressed sensing, however, Chindelevitch and Berger are also able to identify likely locations of network errors.

Compressed sensing exploits the observation that some complex signals — such as audio recordings or digital images — that are computationally intensive to acquire can, upon acquisition, be compressed. That’s because they can be converted into a different mathematical representation that makes them appear much simpler than they did originally. It might be possible, for example, to represent an audio signal that initially consists of 44,000 samples per second of its duration as the weighted sum of a much smaller number of its constituent frequencies.

Compressed sensing performs the initial sampling in a clever way that allows it to build up the simpler representation from scratch, without having to pass through the more complex representation first. In the same way that compressed sensing can decompose an audio signal into the constituent frequencies with the heaviest weights, Chindelevitch and Berger’s algorithm can isolate just those links in a metabolic network that contribute most to its chemical imbalance.

“We’re hoping that this work will provide an impetus to reanalyze a lot of the existing metabolic-network model reconstructions and hopefully spur some collaborations where we actually perform this analysis and suggest corrections to the model before it is published,” Chindelevitch says.

“This is not an area where one would expect there to be a problem,” says Desmond Lun, chair of the Department of Computer Science at Rutgers University, who studies computational biology. “I think [the MIT researchers’ work] will change people’s attitudes in the sense that it raises an issue that most people would have thought was not an issue, and I think it will make us a lot more careful.”

“Computers operate with limited precision because there are only so many digits that you can store — even though, I must say, they store a lot of digits,” Lun explains. “Through software, you can be more or less careful about how much precision you lose in that way. There are very, very good packages out there that try to minimize that problem. And mostly, I would have thought, and I think most people would have thought, that that would be sufficient for these metabolic models.”

Errors in the models may have gone unnoticed because analyses performed on them often comported well with empirical evidence. But “those floating-point errors vary from package to package,” Lun says. “Certainly, it would be very concerning to find that because somebody used this software package, they got these great results, and then if I used a different software package, I would not.”


Topics: Compressed sensing, Computational biology, Metabolism, Synthetic biology, School of Engineering, School of Science, Computer Science and Artificial Intelligence Laboratory (CSAIL), Biology, Electrical Engineering & Computer Science (eecs), Mathematics, Research, Algorithms, Computer science and technology

Comments

I think there is something strange in this study, because I was able to compute an exact viable flux state for the iAF1260 model: http://nbviewer.ipython.org/gi...

Both maize iRS1563 and arabidopsis iRS1597 models can produce biomass, all biomass precursors including energy metabolites. You can confirm this using either a GAMS input file or COBRA. We are puzzled by the authors claim.

The proposed procedure for eliminating thermodynamically infeasible loops does not seem to take into account system-wide imperatives. For example, the arbitrary removal of a reaction participating in a loop may lead to the removal of an essential function. A specific constraint that prevents this from happening appears missing.

I read this paper with great interest. I do not understand why one needs a precision of 100 significant digits when performing FBA. It does seem like an overkill. A simple rescaling of variables could remedy any problems. In any case, after three years of performing FBA calculations I did not encounter any of the stated problems. One simply needs to check that all biomass precursors are not blocked and use an appropriate low feasibility tolerance.

The authors perhaps are not aware of the fact that M. genitalium does not produce amino acids, it needs to be supplemented with all amino acids either as monomers or dipeptides. Therefore, it is not surprising that the model does not grow on minimal glucose medium. It would be odd if it did.

All models in BIGG solve exactly, see

https://twitter.com/ucsd_sbrg/...

It appears that this paper has errors in it:

The conclusions regarding feasibility of COBRA models by Chindelevitch et al. are incorrect. One reason is because they did not take account of the fact that molecules, with suffix _b in the SBML files from the BIGG database, actually correspond to dummy molecules inserted to denote a reaction that exchanges mass with the environment. Such _boundary molecules must be removed to allow mass to flow into and out of any COBRA model.
- Alberto Noronha, Eugen Bauer, Ines Thiele, Ronan Fleming, Luxembourg Centre for Systems Biomedicine, University of Luxembourg.

The conclusions regarding feasibility of COBRA models by Chindelevitch et al. are incorrect. One reason is because they did not take account of the fact that molecules,
with suffix _b in the SBML files from the BIGG database, actually correspond to dummy molecules inserted to denote a reaction that exchanges mass with the environment. Such _boundary molecules must be removed to allow mass to flow into and out of any COBRA model.
Regards,
Alberto Noronha, Eugen Bauer, Ines Thiele, Ronan Fleming,
Luxembourg Centre for Systems Biomedicine, University of Luxembourg.

Hi all,

Some of the previous commenters here and I have thought about this issue for many years. It's true that a model *could* be constructed to be infeasible (e.g., put a very tiny objective coefficient for a biomass component that cannot be synthesized or is blocked). I think the rebuttal from the Palsson group has shown that this does not happen in practice, at least for models from the BiGG resource.

That said, there is relevant and important work going on in the space of solving genome scale models exactly. It is good to have more people thinking about these issues.

For anyone interested, please see "Obtaining exact Solutions to genome-scale constraint based models" from http://www.nature.com/ncomms/j... for a brief technical explanation of the issues that can arise with ME-Models (and see http://www.plosone.org/article... and http://www.ncbi.nlm.nih.gov/pu... and http://www.ncbi.nlm.nih.gov/pu... for what a ME model is).

With these models, Qsopt_ex does the trick, but its terribly slow. We later moved to 80 bit precision with Soplex (specially compiled) which was faster, but doesn't always give perfect answers. Groups at Stanford (Michael Saunders and colleagues/SNOPT) and in Germany at ZIB (soplex/iterative refinement) are working on a more sustainable solution for ME-Models.

I'd love to discuss these issues with the authors of this paper, but we have to start with the understanding that we've been getting metabolism right all along :)

Sincerely,

Joshua Lerman

http://scholar.google.com/cita...

We have addressed all the relevant comments in this thread in a detailed commentary on our website,mongoose.csail.mit.edu.

Leonid Chindelevitch, on behalf of the authors.

Back to the top