Skip to content ↓

Reach in and touch objects in videos with “Interactive Dynamic Video”

Technique from Computer Science and Artificial Intelligence Lab could improve augmented reality and reduce the need for CGI green-screens.
Watch Video
Press Inquiries

Press Contact:

Adam Conner-Simons
Phone: 617-324-9135
MIT Computer Science & Artificial Intelligence Lab
Close
To simulate objects, researchers analyzed video clips to find “vibration modes” at different frequencies that each represent distinct ways that an object can move. By identifying these modes’ shapes, the researchers can begin to predict how these objects will move in new situations.
Caption:
To simulate objects, researchers analyzed video clips to find “vibration modes” at different frequencies that each represent distinct ways that an object can move. By identifying these modes’ shapes, the researchers can begin to predict how these objects will move in new situations.
Credits:
Image: Abe Davis/MIT CSAIL
Using traditional cameras and algorithms, IDV looks at the tiny, almost invisible vibrations of an object to create video simulations that users can virtually interact with.
Caption:
Using traditional cameras and algorithms, IDV looks at the tiny, almost invisible vibrations of an object to create video simulations that users can virtually interact with.
Credits:
Image: Abe Davis/MIT CSAIL

We learn a lot about objects by manipulating them: poking, pushing, prodding, and then seeing how they react.

We obviously can’t do that with videos — just try touching that cat video on your phone and see what happens. But is it crazy to think that we could take that video and simulate how the cat moves, without ever interacting with the real one?

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have recently done just that, developing an imaging technique called Interactive Dynamic Video (IDV) that lets you reach in and “touch” objects in videos. Using traditional cameras and algorithms, IDV looks at the tiny, almost invisible vibrations of an object to create video simulations that users can virtually interact with.

Video thumbnail Play video
Interactive Dynamic Video demonstration from the MIT Computer Science and Artificial Intelligence Laboratory

"This technique lets us capture the physical behavior of objects, which gives us a way to play with them in virtual space,” says CSAIL PhD student Abe Davis, who will be publishing the work this month for his final dissertation. “By making videos interactive, we can predict how objects will respond to unknown forces and explore new ways to engage with videos.”

Davis says that IDV has many possible uses, from filmmakers producing new kinds of visual effects to architects determining if buildings are structurally sound. For example, he shows that, in contrast to how the popular Pokemon Go app can drop virtual characters into real-world environments, IDV can go a step beyond that by actually enabling virtual objects (including Pokemon) to interact with their environments in specific, realistic ways, like bouncing off the leaves of a nearby bush.

He outlined the technique in a paper he published earlier this year with PhD student Justin G. Chen and professor Fredo Durand.

How it works

The most common way to simulate objects’ motions is by building a 3-D model. Unfortunately, 3-D modeling is expensive, and can be almost impossible for many objects. While algorithms exist to track motions in video and magnify them, there aren’t ones that can reliably simulate objects in unknown environments. Davis’ work shows that even five seconds of video can have enough information to create realistic simulations.

To simulate the objects, the team analyzed video clips to find “vibration modes” at different frequencies that each represent distinct ways that an object can move. By identifying these modes’ shapes, the researchers can begin to predict how these objects will move in new situations.

“Computer graphics allows us to use 3-D models to build interactive simulations, but the techniques can be complicated,” says Doug James, a professor of computer science at Stanford University who was not involved in the research. “Davis and his colleagues have provided a simple and clever way to extract a useful dynamics model from very tiny vibrations in video, and shown how to use it to animate an image.”

Davis used IDV on videos of a variety of objects, including a bridge, a jungle gym, and a ukelele. With a few mouse-clicks, he showed that he can push and pull the image, bending and moving it in different directions. He even demonstrated how he can make his own hand appear to telekinetically control the leaves of a bush.

“If you want to model how an object behaves and responds to different forces, we show that you can observe the object respond to existing forces and assume that it will respond in a consistent way to new ones,” says Davis, who also found that the technique even works on some existing videos on YouTube.

Applications

Researchers say that the tool has many potential uses in engineering, entertainment, and more.

For example, in movies it can be difficult and expensive to get CGI characters to realistically interact with their real-world environments. Doing so requires filmmakers to use green-screens and create detailed models of virtual objects that can be synchronized with live performances.

But with IDV, a videographer could take video of an existing real-world environment and make some minor edits like masking, matting, and shading to achieve a similar effect in much less time — and at a fraction of the cost.

Engineers could also use the system to simulate how an old building or bridge would respond to strong winds or an earthquake.

“The ability to put real-world objects into virtual models is valuable for not just the obvious entertainment applications, but also for being able to test the stress in a safe virtual environment, in a way that doesn’t harm the real-world counterpart,” says Davis.

He says that he is also eager to see other applications emerge, from studying sports film to creating new forms of virtual reality.

“When you look at VR companies like Oculus, they are often simulating virtual objects in real spaces,” he says. “This sort of work turns that on its head, allowing us to see how far we can go in terms of capturing and manipulating real objects in virtual space.”

This work was supported by the National Science Foundation and the Qatar Computing Research Institute. Chen also received support from Shell Research through the MIT Energy Initiative.

Press Mentions

The Guardian

MIT researchers have developed a system that allows users to interact with video simulations, writes Joanna Goodman for The Guardian. The system “uses video to virtualize physical content so that it can interact with virtual content, so that when you see – on your smartphone – a Pokémon interact with a flexible object, you also see that object react.”

Scientific American

A new imaging technique developed by MIT researchers creates video simulations that people can interact with, writes Charles Choi for Scientific American. “In addition to fueling game development, these advances could help simulate how real bridges and buildings might respond to potentially disastrous situations,” Choi explains. 

BBC News

BBC News reports that CSAIL researchers have created an algorithm that can manipulate still objects in photographs and videos. The technique doesn’t require any special cameras, which makes it great for improving the realism in augmented reality games like Pokémon Go.

Popular Science

CSAIL researchers have created a tool that allows people to interact with videos, writes Mary Beth Griggs for Popular Science. The technique could “make augmented reality animations integrate even more with the 'reality' part of augmented reality, help engineers model how structures will react when different forces are applied, or as a less expensive way to create special effects.”

NBC News

Alyssa Newcomb writes for NBC News that MIT researchers have developed a system that allows users to interact with virtual objects. Newcomb explains that the “technology could be used to make movies or even by engineers wanting to find out how an old bridge may respond to inclement weather.”

Related Links

Related Topics

Related Articles

More MIT News