Getting to the heart of causality is central to understanding the world around us. What causes one variable — be it a biological species, a voting region, a company stock, or a local climate — to shift from one state to another can inform how we might shape that variable in the future.
But tracing an effect to its root cause can quickly become intractable in real-world systems, where many variables can converge, confound, and cloud over any causal links.
Now, a team of MIT engineers hopes to provide some clarity in the pursuit of causality. They developed a method that can be applied to a wide range of situations to identify those variables that likely influence other variables in a complex system.
The method, in the form of an algorithm, takes in data that have been collected over time, such as the changing populations of different species in a marine environment. From those data, the method measures the interactions between every variable in a system and estimates the degree to which a change in one variable (say, the number of sardines in a region over time) can predict the state of another (such as the population of anchovy in the same region).
The engineers then generate a “causality map” that links variables that likely have some sort of cause-and-effect relationship. The algorithm determines the specific nature of that relationship, such as whether two variables are synergistic — meaning one variable only influences another if it is paired with a second variable — or redundant, such that a change in one variable can have exactly the same, and therefore redundant, effect as another variable.
The new algorithm can also make an estimate of “causal leakage,” or the degree to which a system’s behavior cannot be explained through the variables that are available; some unknown influence must be at play, and therefore, more variables must be considered.
“The significance of our method lies in its versatility across disciplines,” says Álvaro Martínez-Sánchez, a graduate student in MIT’s Department of Aeronautics and Astronautics (AeroAstro). “It can be applied to better understand the evolution of species in an ecosystem, the communication of neurons in the brain, and the interplay of climatological variables between regions, to name a few examples.”
For their part, the engineers plan to use the algorithm to help solve problems in aerospace, such as identifying features in aircraft design that can reduce a plane’s fuel consumption.
“We hope by embedding causality into models, it will help us better understand the relationship between design variables of an aircraft and how it relates to efficiency,” says Adrián Lozano-Durán, an associate professor in AeroAstro.
The engineers, along with MIT postdoc Gonzalo Arranz, have published their results in a study appearing today in Nature Communications.
Seeing connections
In recent years, a number of computational methods have been developed to take in data about complex systems and identify causal links between variables in the system, based on certain mathematical descriptions that should represent causality.
“Different methods use different mathematical definitions to determine causality,” Lozano-Durán notes. “There are many possible definitions that all sound ok, but they may fail under some conditions.”
In particular, he says that existing methods are not designed to tell the difference between certain types of causality. Namely, they don’t distinguish between a “unique” causality, in which one variable has a unique effect on another, apart from every other variable, from a “synergistic” or a “redundant” link. An example of a synergistic causality would be if one variable (say, the action of drug A) had no effect on another variable (a person’s blood pressure), unless the first variable was paired with a second (drug B).
An example of redundant causality would be if one variable (a student’s work habits) affect another variable (their chance of getting good grades), but that effect has the same impact as another variable (the amount of sleep the student gets).
“Other methods rely on the intensity of the variables to measure causality,” adds Arranz. “Therefore, they may miss links between variables whose intensity is not strong yet they are important.”
Messaging rates
In their new approach, the engineers took a page from information theory — the science of how messages are communicated through a network, based on a theory formulated by the late MIT professor emeritus Claude Shannon. The team developed an algorithm to evaluate any complex system of variables as a messaging network.
“We treat the system as a network, and variables transfer information to each other in a way that can be measured,” Lozano-Durán explains. “If one variable is sending messages to another, that implies it must have some influence. That’s the idea of using information propagation to measure causality.”
The new algorithm evaluates multiple variables simultaneously, rather than taking on one pair of variables at a time, as other methods do. The algorithm defines information as the likelihood that a change in one variable will also see a change in another. This likelihood — and therefore, the information that is exchanged between variables — can get stronger or weaker as the algorithm evaluates more data of the system over time.
In the end, the method generates a map of causality that shows which variables in the network are strongly linked. From the rate and pattern of these links, the researchers can then distinguish which variables have a unique, synergistic, or redundant relationship. By this same approach, the algorithm can also estimate the amount of “causality leak” in the system, meaning the degree to which a system’s behavior cannot be predicted based on the information available.
“Part of our method detects if there’s something missing,” Lozano-Durán says. “We don’t know what is missing, but we know we need to include more variables to explain what is happening.”
The team applied the algorithm to a number of benchmark cases that are typically used to test causal inference. These cases range from observations of predator-prey interactions over time, to measurements of air temperature and pressure in different geographic regions, and the co-evolution of multiple species in a marine environment. The algorithm successfully identified causal links in every case, compared with most methods that can only handle some cases.
The method, which the team coined SURD, for Synergistic-Unique-Redundant Decomposition of causality, is available online for others to test on their own systems.
“SURD has the potential to drive progress across multiple scientific and engineering fields, such as climate research, neuroscience, economics, epidemiology, social sciences, and fluid dynamics, among others areas,” Martínez-Sánchez says.
This research was supported, in part, by the National Science Foundation.
 
        
               
        
 
        
 
        
 
 
 
 
 
