MIT and Harvard release de-identified learning data from open online courses

Dataset contains the original learning data from the 16 HarvardX and MITx courses offered in 2012-13.

News Office

May 30, 2014

Press Inquiries

Press Contact:

Kimberly Allen

Email: allenkc@mit.edu

Phone: 617-253-2702

Fax: 617-258-8762

MIT News Office

Michael Patrick Rutter

Email: mprutter@mit.edu

Phone: 617-253-4793

A research team from Harvard University and MIT has released its third and final promised deliverable — the de-identified learning data — relating to an initial study of online learning based on each institution’s first-year courses on the edX platform.

Specifically, the dataset contains the original learning data from the 16 HarvardX and MITx courses offered in 2012-13 that formed the basis of the first HarvardX and MITx working papers (released in January) and underpin a suite of powerful open-source interactive visualization tools (released in February).

The dataset was subjected to a careful process of de-identification: removing personally identifiable information, using best practices including aggregation, anonymization via random identifiers, and blurring to reduce individuality of sensitive data fields, among other techniques.

“We are excited to be able to present the data behind the reports we released in January. This step opens the door to more sophisticated analyses that build on what we have already done,” says co-lead researcher Isaac Chuang, a professor in MIT’s electrical engineering and computer science and physics departments. “MITx and HarvardX are committed to upholding learner privacy as well as advancing learning research. These data are a public good.”

Harvard’s Andrew Ho, Chuang’s co-lead, adds that the release of the data fulfills an intention — namely, to share best practices to improve teaching and learning both on campus and online — that was made with the launch of edX by Harvard and MIT in May 2012.

Ho and Chuang anticipate that the data will offer insight to other educational researchers. Moreover, the methods used to protect learner privacy comply with FERPA (Federal Education Rights and Privacy Act) regulations, which govern the release of such data. The practice should inform the release of future datasets from edX and offer lessons more broadly.

“Learning data from open online courses hold great promise for research, but good research must be replicable by others,” says Ho, an associate professor at the Harvard Graduate School of Education and co-chair of the HarvardX Research Committee. “By sharing these de-identified data, we hope to show that we can protect information about individuals while still enabling replicable research about what works in online learning.”

The MIT Office of Digital Learning, HarvardX, and MIT’s Institutional Research group in the Office of the Provost contributed to the release of the dataset.

To learn more:

Person-Course De-identification Process
The HarvardX-MITx Person-Course Dataset AY2013 supplemental document
Person-course dataset AY2013
“HarvardX and MITx: The first year of open online courses,” by Andrew Ho, Justin Reich, Sergiy Nesterko, Daniel Seaton, Tommy Mullaney, Jim Waldo, and Isaac Chuang
Insights (HarvardX, MITx)

MIT News | Massachusetts Institute of Technology - On Campus and Around the world

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

MIT and Harvard release de-identified learning data from open online courses

Press Contact:

Related Topics

Related Articles

More MIT News

MIT student teams win top honors in NASA competition

MIT researchers advance toward greater bandwidth, more energy-efficient communications

Q&A: What is agentic AI today, and what do we want it to be?

Inaugural Music Technology Research Showcase celebrates work of new graduate program’s initial students

Two MIT faculty members named 2026 Pew Biomedical Scholars

Scientists find ozone depletion began decades before discovery of ozone hole

Browse By

Topics

Departments

Centers, Labs, & Programs

Schools

Breadcrumb

MIT and Harvard release de-identified learning data from open online courses

Press Contact:

Share this news article on:

Related Links

Related Topics

Related Articles

More MIT News