• Oblong's collaborative-conferencing system, Mezzanine, being used in a conference room.

    Oblong's collaborative-conferencing system, Mezzanine, being used in a conference room.

    Courtesy of Oblong Industries

    Full Screen
  • A Wii-like wand used to manipulate data with gestural control in the Mezzanine system.

    A Wii-like wand used to manipulate data with gestural control in the Mezzanine system.

    Courtesy of Oblong Industries

    Full Screen

Manual control

Oblong Industries brings gesture-control technology from Hollywood to corporate conference rooms.

Press Contact

Abby Abazorius
Email: abbya@mit.edu
Phone: 617-253-2709
MIT News Office

Media Resources

2 images for download

Access Media

Media can only be downloaded from the desktop version of this website.

When you imagine the future of gesture-control interfaces, you might think of the popular science-fiction films “Minority Report” (2002) or “Iron Man” (2008). In those films, the protagonists use their hands or wireless gloves to seamlessly scroll through and manipulate visual data on a wall-sized, panoramic screen.

We’re not quite there yet. But the brain behind those Hollywood interfaces, MIT alumnus John Underkoffler ’88, SM ’91, PhD ’99 — who served as scientific advisor for both films — has been bringing a more practical version of that technology to conference rooms of Fortune 500 and other companies for the past year.  

Underkoffler’s company, Oblong Industries, has developed a platform called g-speak, based on MIT research, and a collaborative-conferencing system called Mezzanine that allows multiple users to simultaneously share and control digital content across multiple screens, from any device, using gesture control.

Overall, the major benefit in such a system lies in boosting productivity during meetings, says Underkoffler, Oblong’s CEO. This is especially true for clients who tend to pool resources into brainstorming and whose meeting rooms may remain open all day, every day.

“If you can make those meetings synthetically productive — not just times for people to check in, produce status reports, or check email surreptitiously under the table — that can be electrifying force for the enterprise,” he says.

Mezzanine surrounds a conference room with multiple screens, as well as the “brains” of the system (a small server) that controls and syncs everything. Several Wii-like wands, with six degrees of freedom, allow users to manipulate content — such as text, photos, videos, maps, charts, spreadsheets, and PDFs — depending on certain gestures they make with the wand.

That system is built on g-speak, a type of operating system — or a so-called “spatial operating environment” — used by developers to create their own programs that run like Mezzanine.

“G-speak programs run in a distributed way across multiple machines and allow concurrent interactions for multiple people,” Underkoffler says. “This shift in thinking — as if from single sequential notes to chords and harmonies — is powerful."

Oblong’s clients include Boeing, Saudi Aramco, SAP, General Electric, and IBM, as well as government agencies and academic institutions, such as Harvard University’s Graduate School of Design. Architects and real estate firms are also using the system for structural designing.

Putting pixels in the room

G-speak has its roots in a 1999 MIT Media Lab project — co-invented by Underkoffler in Professor Hiroshi Ishii’s Tangible Media Group — called “Luminous Room,” which enabled all surfaces to hold data that could be manipulated with gestures. “It literally put pixels in the room with you,” Underkoffler says.

The group designed light bulbs, called “1/0 Bulbs,” that not only projected information, but also collected the information from a surface it projected onto. That meant the team could make any projected surface a veritable computer screen, and the data could interact with, and be controlled by, physical objects.

They also assigned pixels three-dimensional coordinates. Imagine, for example, if you sat down in a chair at a table, and tried to describe where the front, left corner of that table was located in physical space. “You’d say that corner is this far off the floor, this far to the right of my chair, and this much in front of me, among other things,” Underkoffler explains. “We started doing that with pixels.”

One application for urban planners involved placing small building models onto a 1/0 Bulb projected table, “and the pixels surrounded the model,” Underkoffler says. This provided three-dimensional spatial information, from which the program casted accurate, digital shadows from the models onto the table. (Changing the time on a digital clock changed the direction of the shadows.)

In another application, the researchers used a glass vase to manipulate digital text and image boxes that were projected onto a whiteboard. The digital boxes were linked to the vase in a circle via digital “springs.” When the vase moved, all the graphics followed. When the vase rotated, the graphics bunched together and “self-stored” into the vase; when the vase rotated again, the graphics reappeared in their first form.

These initial concepts — using the whole room as a digital workplace — became the foundation for g-speak. “I really wanted to get the ideas out into the world in a form that everyone could use,” Underkoffler says. “Generally, that means commercial form, but the world of movies came calling first.”

 “The world’s largest focus group”

Underkoffler was recruited as scientific advisor for Steven Spielberg’s “Minority Report” after meeting the film’s crew, who were searching for novel technology ideas at the Media Lab. Later, in 2003, Underkoffler reprised his behind-the-scenes gig for Ang Lee’s “Hulk,” and, in 2008, for Jon Favreau’s “Iron Man,” which both depicted similar technologies.

Seeing this technology on the big screen inspired Underkoffler to refine his MIT technology, launch Oblong in 2006, and build early g-speak prototypes — glove-based systems that eventually ended up with the company’s first customer, Boeing.

Having tens of millions of viewers seeing the technology on the big screen, however, offered a couple of surprising perks for Oblong, which today is headquartered in Los Angeles, with nine other offices and demo rooms in cities including Boston, New York, and London. “It might have been the world’s largest focus group,” Underkoffler says.

Those enthused by the technology, for instance, started getting in touch with Underkoffler to see if the technology was real. Additionally, being part of a big-screen production helped Underkoffler and Oblong better explain their own technology to clients, Underkoffler says. In such spectacular science-fiction films, technology competes for viewer attention and, yet, it needs to be simplified so viewers can understand it clearly.

“When you take technology from a lab like at MIT, and you need to show it in a film, the process of refining and simplifying those ideas so they’re instantly legible on screen is really close to the refinement you need to undertake if you’re turning that lab work into a product,” he says. “It was enormously valuable to us to strip away everything in the system that wasn’t necessary and leave a really compact core of user-interface ideas we have today.”

After years of writing custom projects for clients on g-speak, Oblong turned the most-requested features of these jobs — such as having cross-platform and multiple-user capabilities — into Mezzanine. “It was the first killer application we could write on top of g-speak,” he says. “Building a universal, shared-pixel workspace has enormous value no matter what your business is.”

Today, Oblong is shooting for greater ubiquity of its technology. But how far away are we from a consumer model of Mezzanine? It could take years, Underkoffler admits: “But we really hope to radically tilt the whole landscape of how we think about computers and user interface.”

Topics: Innovation and Entrepreneurship (I&E), Startups, Alumni/ae, Media Lab, Tangible Media Group, School of Architecture + Planning, Computer science and technology, Gestural interfaces, Film and Television

Back to the top