The inaugural MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to 10 individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize recognizes MIT-affiliated researchers who make their data openly accessible and reusable by others. The prize winners and 16 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 28 at Hayden Library.
“By making data open, researchers create opportunities for novel uses of their data and for new insights to be gleaned,” says Chris Bourg, director of MIT Libraries. “Open data accelerates scholarly progress and discovery, advances equity in scholarly participation, and increases transparency, replicability, and trust in science.”
Recognizing shared values
Spearheaded by Bourg and Rebecca Saxe, associate dean of the School of Science and John W. Jarve (1978) Professor of Brain and Cognitive Sciences, the MIT Prize for Open Data was launched to highlight the value of open data at MIT and to encourage the next generation of researchers. Nominations were solicited from across the Institute, with a focus on trainees: research technicians, undergraduate or graduate students, or postdocs.
“By launching an MIT-wide prize and event, we aimed to create visibility for the scholars who create, use, and advocate for open data,” says Saxe. “Highlighting this research and creating opportunities for networking would also help open-data advocates across campus find each other.”
Recognizing researchers who share data was also one of the recommendations of the Ad Hoc Task Force on Open Access to MIT’s Research, which Bourg co-chaired with Hal Abelson, Class of 1922 Professor, Department of Electrical Engineering and Computer Science. An annual award was one of the strategies put forth by the task force to further the Institute’s mission to disseminate the fruits of its research and scholarship as widely as possible.
Winners and honorable mentions were chosen from more than 70 nominees, representing all five schools, the MIT Schwarzman College of Computing, and several research centers across MIT. A committee composed of faculty, staff, and a graduate student made the selections:
- Yunsie Chung, graduate student in the Department of Chemical Engineering, won for SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds.
- Matthew Groh, graduate student, MIT Media Lab, accepted on behalf of the team behind the Fitzpatrick 17k dataset, an open dataset consisting of nearly 17,000 images of skin disease alongside skin disease and skin tone annotations.
- Tom Pollard, research scientist at the Institute for Medical Engineering and Science, accepted on behalf of the PhysioNet team. This data-sharing platform enables thousands of clinical and machine-learning research studies each year and allows researchers to share sensitive resources that would not be possible through typical data sharing platforms.
- Joseph Replogle, graduate student with the Whitehead Institute for Biomedical Research, was recognized for the Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date.
- Pedro Reynolds-Cuéllar, graduate student with the MIT Media Lab/Art, Culture, and Technology, and Diana Duarte, co-founder at Diversa, won for Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings.
- Maanas Sharma, an undergraduate student, led States of Emergency, a nationwide project analyzing and grading the responses of prison systems to Covid-19 using data scraped from public databases and manually collected data.
- Djuna von Maydell, graduate student in the Department of Brain and Cognitive Sciences, created the first publicly available dataset of single-cell gene expression from postmortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene.
- Raechel Walker, graduate researcher in the MIT Media Lab, and her collaborators created a Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program in Cambridge, Massachusetts. Students learned how to use data science to recognize, mitigate, and advocate for people who are disproportionately impacted by systemic inequality.
- Suyeol Yun, graduate student in the Department of Political Science, was recognized for DeepWTO, a project creating open data for use in legal natural language processing research using cases from the World Trade Organization.
- Jonathan Zheng, graduate student in the Department of Chemical Engineering, won for an open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution.
A full list of winners and honorable mentions is available on the Open Data @ MIT website.
A campus-wide celebration
Awards were presented at a celebratory event held in the Nexus in Hayden Library during International Open Access Week. School of Science Dean Nergis Mavalvala kicked off the program by describing the long and proud history of open scholarship at MIT, citing the Institute-wide faculty open access policy and the launch of the open-source digital repository DSpace. “When I was a graduate student, we were trying to figure out how to share our theses during the days of the nascent internet,” she said, “With DSpace, MIT was figuring it out for us.”
The centerpiece of the program was a series of five-minute presentations from the prize winners on their research. Presenters detailed the ways they created, used, or advocated for open data, and the value that openness brings to their respective fields. Winner Djuna von Maydell, a graduate student in Professor Li-Huei Tsai’s lab who studies the genetic causes of neurodegeneration, underscored why it is important to share data, particularly data obtained from postmortem human brains.
“This is data generated from human brains, so every data point stems from a living, breathing human being, who presumably made this donation in the hope that we would use it to advance knowledge and uncover truth,” von Maydell said. “To maximize the probability of that happening, we have to make it available to the scientific community.”
MIT community members who would like to learn more about making their research data open can consult MIT Libraries’ Data Services team.