For several years now, the research groups of MIT professors of computer science and engineering William Freeman and Frédo Durand have been investigating techniques for amplifying movements captured by video but indiscernible to the human eye. Versions of their algorithms can make the human pulse visible and even recover intelligible speech from the vibrations of objects filmed through soundproof glass.
Earlier this month, at the Computer Vision and Pattern Recognition conference, Freeman, Durand, and colleagues at the Qatar Computing Research Institute (QCRI) presented a new version of the algorithm that can amplify small motions even when they’re contained within objects executing large motions. So, for instance, it could make visible the precise sequence of muscle contractions in the arms of a baseball player swinging the bat, or in the legs of a soccer player taking a corner kick.
“The previous version of the algorithm assumed everything was small in the video,” Durand says. “Now we want to be able magnify small motions that are hidden within large motions. The basic idea is to try to cancel the large motion and go back to the previous situation.”
Canceling the large motion means determining which pixels of successive frames of video belong to a moving object and which belong to the background. As Durand explains, that problem becomes particularly acute at the object’s boundaries.
If a digital camera captures an image of, say, a red object against a blue background, some of its photosensors will register red light, and some will register blue. But the sensors corresponding to the object’s boundaries may in fact receive light from both foreground and background, so they’ll register varying shades of purple.
Ordinarily, an algorithm separating foreground from background could probably get away with keeping those borderline pixels: A human viewer probably wouldn’t notice a tiny fringe of purple around a red object. But the purpose of the MIT researchers’ motion amplification algorithm is precisely to detect variations invisible to the naked eye. Changes of color at an object’s boundaries could be interpreted as motions requiring magnification.
So Durand, Freeman, and Mohamed Elgharib and Mohamed Hefeeda of QCRI instead assign each boundary pixel a weight, corresponding to the likelihood that it belongs to the foreground object. In the example of the red object against a blue background, that weight would simply depend on whether the shade of purple is bluer or redder. Then, on the basis of the pixels’ weights, the algorithm randomly discards some and keeps others. On average, it will make the right decision, and it will disrupt any patterns of color change that could be mistaken for motion.
The problem of identifying the same object from frame to frame, Durand says, is related to the problem of image stabilization, which attempts to remove camera jitter from video. Identifying the motion of a single object, however, is more difficult than determining the motion of the image as a whole.
The MIT and QCRI researchers make a few assumptions to render the problem more tractable. First, they assume a correlation between the direction and rate of motion of adjacent pixels. Second, they assume “smoothness” — that the direction and rate of motion will be consistent over time. Finally, they assume that pixels’ trajectories across frames can be captured by linear mathematical relationships, which enables their algorithm to analyze pixels individually.
Then, rather than looking for correlations between one frame and the next, their algorithm considers five frames at a time, using consistencies across frames to resolve ambiguities between adjacent frames.
Once the algorithm has identified the pixels correlating to a single moving object, it corrects for the object’s motion and performs the same motion magnification procedure that previous versions did. Finally, it reinserts the magnified motions back into the original video stream.