Skip to content ↓

Speech, vision and security technologies are as natural as Oxygen

When it was founded two years ago, MIT's Project Oxygen Alliance set out to create a new form of computing and communication: human-centered, ubiquitous and transparent. At the second annual meeting of the alliance on June 12-13, researchers showcased the technology advances that this ambitious goal has inspired.

Begun as a partnership among the Laboratory for Computer Science , the Artificial Intelligence Laboratory and six corporations with support from the Defense Advanced Research Projects Agency, the Project Oxygen Alliance seeks to make computation and communication as abundant and natural to use as oxygen in the air. The goal is to free people from computer jargon, keyboards, mice and other specialized devices, allowing them to meet their computation and communication needs anytime and anywhere.

MIT researchers have been busy creating speech and vision technologies that let people communicate naturally with computers just as they would with other people; developing decentralized networks and robust software/hardware architectures that adapt to mobile users, currently available resources or varying operating conditions; and devising security and privacy mechanisms that safeguard personal information and resources.

"The theme of the second year of the Oxygen alliance is integration," said Professor Victor Zue, director of the Lab for Computer Science. "For example, speech and vision techniques are used jointly to recognize a person and to provide speech and gesture understanding. Similarly, wireless location support, ad hoc networks and novel security protocols are utilized to provide mobile and secure information delivery."

New technologies that were demonstrated include:

  • Multilingual conversational systems that can recognize, understand and respond to naturally spoken requests. The system can be configured rapidly to handle complex dialogues that allow users to obtain information such as the weather in Tokyo or traffic conditions and hotels in Boston.
  • An integrated vision and speech system that uses cameras and microphone arrays to track a speaker's location and arm position, extract the speaker's voice from background noise, and respond to a combination of pointing gestures and spoken commands such as "move that one over here" or "show me the video on that screen."
  • Systems that integrate software services to accomplish user-defined tasks. For example, a smart room equipped with embedded speech, video and motion detectors automatically records and recalls key meeting events, monitoring and responding to visual and auditory cues that flow naturally from normal interactions among group members.
  • A computer-aided design tool that understands simple mechanical devices as they are sketched on whiteboards or tablets. Liberated from mice, menus and icons, users can draw, simulate, modify and test design elements just as they would with an expert designer.
  • Systems that let users access computers, printers and remote services by describing what they want to do rather than by remembering computer-coded addresses for the devices and their locations. Low-cost, ceiling-mounted beacons enable mobile users to determine where they are indoors, without having to reveal their own location. The systems respond to user commands such as "print this picture on the nearest color printer."
  • A secure, self-configuring, decentralized wireless network that lets mobile users communicate spontaneously using handheld devices and share information with one another, utilizing multiple network protocols without requiring additional access points or intervention from service providers.
  • Hardware and software architectures that determine and implement the best allocation of resources for streaming multimedia applications. These architectures optimize the use of computer circuitry and power, thereby boosting the performance and lowering the cost of wireless handheld devices that link mobile users to Oxygen networks.

A version of this article appeared in MIT Tech Talk on July 17, 2002.

Related Topics

More MIT News