The vast amount of data generated daily across society is widely touted as a game-changer for research, technological innovation, and even policy making. But “big data will not change the world unless it’s collected and synthesized into tools that have a public benefit,” said Sarah Williams, an assistant professor of urban planning at MIT, in a panel discussion on the future of cities, at a conference convened last week by the Institute for Data, Systems and Society (IDSS).
The ways in which data can be used to produce change was a common theme among speakers at the IDSS celebration, which focused on how the deluge of data being gathered in the big data era can be used to tackle society’s most pressing challenges. The two-day event brought together experts from a variety of fields, including energy, health care, finance, urban planning, engineering, computer science, and political science. The lineup even featured one speaker who, MIT President L. Rafael Reif joked, “knows who will win the election in November.” That would be Nate Silver, founder and editor-in-chief of the political poll analysis website FiveThirtyEight.
The event participants had much to celebrate. Launched in July 2015, IDSS accomplished a number of milestones in its first year, including the introduction of a new undergraduate minor in statistics and data science, a new doctoral program in social engineering and systems, a professional education course in data science, and a center focused on statistics and data sciences.
The all-star speaker lineup at the event was a testament to IDSS’s ability to bring together “data scientists and systems engineers with experts in economics, finance, urban planning, energy, public health, political science, social networks, and more,” Reif said. He added that IDSS is “a unit that can magnify individual talents through collaborations, a unit that aspires to generate groundbreaking ways to understand society’s most difficult problems and lead us to badly needed solutions.”
At IDSS, researchers are focused on taking “an analytical, data-driven approach to problems,” said Munther Dahleh, director of IDSS and the William A. Coolidge Professor of Electrical Engineering and Computer Science. “We collect the data, we develop the models, and from these models we develop insights, policies, and decisions.”
Data in the political process
The event opened with a panel discussion focused on the future of voting and elections. Charles Stewart, the Kenan Sahin Distinguished Professor in the MIT Department of Political Science, set the stage by noting the increasing role of data in the political process. Stewart, who co-directs the Caltech/MIT Voting Technology Project, described how data is collected from voter registration files, campaigns and politicians, public opinion polls, campaign contribution records, and more. He added that many citizens might be surprised to learn that the identity of anyone who has registered to vote is public record, while the data and computer code in voting machines is not always available to the public or election officials.
“Interest in election data is not simply about choosing the best candidates or policies,” Stewart explained. “It’s also about who controls the data and how it is used.”
MIT alumna Kassia DeVorsey ’04, who worked for the Obama campaign and is now the chief analytics officer at the Messina Group and founder of Minerva Insights, explained that while previously only presidential campaigns invested in gathering and analyzing data, nowadays, “if you’re running for mayor in a small town, you’re thinking strategically about ‘how can I use data to best run my campaign.’” She noted that the voter-information data compiled by the Obama campaign was the team’s most valuable resource in trying to address and influence the electorate.
During his talk, Silver explained that FiveThirtyEight is empirically minded and draws from publicly available information to generate probabilistic election forecasts. As for the 2016 presidential election, the high number of undecided voters has introduced more volatility, according to Silver. “This year, even relatively minor events have produced a shift. Therefore the debates … can matter quite a bit,” he said.
Silver said that while the polls show Democratic presidential nominee Hillary Clinton is the favored candidate, the race has tightened and Republican nominee Donald Trump does have a chance to win. Based on the high level of uncertainty surrounding this year’s election, Silver said he and his colleagues are “urging caution. … You can build models and you can do the data science, but sometimes the conclusion can be: Be careful.”
Regarding the role of gender in the presidential election, DeVorsey described how during the 2008 election, the Obama campaign asked voters oblique questions about race, to try to gauge whether polling was capturing how racism might impact the election outcome. The Clinton campaign is probably trying a similar tactic, she suggested. Meanwhile, Silver questioned whether Clinton’s high unfavorability rating can be explained without reference to her gender, adding that he thinks “the sexism question is, frankly, badly understudied.”
Data-driven policy and financial risk
Beyond the use of data in elections, Alberto Abadie, a professor of economics at MIT, and Enrico Giovannini, a professor at the University of Rome Tor Vergata, explored how data can be used to drive policy. Abadie questioned whether automatic policymaking might be possible in the future, thanks to insights from data collection.
Giovannini urged the audience to use data to help transform policies, in order to improve people’s well being and encourage sustainable development. “We produce statistics because we believe facts can improve decision-making on many levels,” he explained. Giovannini also cautioned against potential pitfalls of relying too heavily on data, adding that policymakers need to use data to not just understand problems but also develop solutions.
Another difficulty of data collection, raised by Bengt Holmstrom, the Paul A. Samuelson Professor of Economics, lies in financial risk, particularly in money markets. While there have been calls for increased transparency following the 2008 financial crisis, Holmstrom argued that in money markets, more transparency can lead to less liquidity. Unlike the stock market, “money markets are fundamentally information-sparse and opaque,” Holmstrom explained. In terms of managing systemic risk in money markets, he said “transparency is not likely to be the way unless you think that maybe will regulate the markets to be less liquid.”
One area where speakers called for greater transparency in the use of data is urban planning. A panel moderated by Williams examined how data can be used to make cities better places for people to live.
Panelists described how data can be used to alleviate congestion and noise, and also examined the ethical and privacy implications for residents in places where governments are collecting and analyzing data.
During her talk, Williams displayed data visualizations her group created to illustrate the cost of incarceration in Brownsville, Brooklyn. The images exposed systemic issues in the neighborhood, including areas lacking services that could alleviate mass incarceration. The goal of her research, Williams explained, is to transform data sets “into visualizations that I hope expose urban policy issues.”
In addition to a panel discussion on social networks, the event also featured a panel discussion on the future of the electric grid, moderated by Robert Armstrong, director of the MIT Energy Initiative and the Chevron Professor of Chemical Engineering; and a session on how data can be used to analyze our health, moderated by professor of computer science and engineering Peter Szolovits.