Skip to content ↓

Deep learning with point clouds

Research aims to make it easier for self-driving cars, robotics, and other applications to understand the 3D world.
Press Inquiries

Press Contact:

Adam Conner-Simons
Phone: 617-324-9135
MIT Computer Science & Artificial Intelligence Lab
Close

If you’ve ever seen a self-driving car in the wild, you might wonder about that spinning cylinder on top of it. 

It’s a “lidar sensor,” and it’s what allows the car to navigate the world. By sending out pulses of infrared light and measuring the time it takes for them to bounce off objects, the sensor creates a “point cloud” that builds a 3D snapshot of the car’s surroundings. 

Making sense of raw point-cloud data is difficult, and before the age of machine learning it traditionally required highly trained engineers to tediously specify which qualities they wanted to capture by hand. But in a new series of papers out of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), researchers show that they can use deep learning to automatically process point clouds for a wide range of 3D-imaging applications.

“In computer vision and machine learning today, 90 percent of the advances deal only with two-dimensional images,” says MIT Professor Justin Solomon, who was senior author of the new series of papers spearheaded by PhD student Yue Wang. “Our work aims to address a fundamental need to better represent the 3D world, with application not just in autonomous driving, but any field that requires understanding 3D shapes.” 

Most previous approaches haven’t been especially successful at capturing the patterns from data that are needed to get meaningful information out of a bunch of 3D points in space. But in one of the team’s papers, they showed that their “EdgeConv” method of analyzing point clouds using a type of neural network called a dynamic graph convolutional neural network allowed them to classify and segment individual objects. 

“By building ‘graphs’ of neighboring points, the algorithm can capture hierarchical patterns and therefore infer multiple types of generic information that can be used by a myriad of downstream tasks,” says Wadim Kehl, a machine learning scientist at Toyota Research Institute who was not involved in the work. 

In addition to developing EdgeConv, the team also explored other specific aspects of point-cloud processing. For example, one challenge is that most sensors change perspectives as they move around the 3D world; every time we take a new scan of the same object, its position may be different than the last time we saw it. To merge multiple point clouds together into a single detailed view of the world, you need to align multiple 3D points in a process called “registration.” 

Registration is vital for many forms of imaging, from satellite data to medical procedures. For example, when a doctor has to take multiple magnetic resonance imaging scans of a patient over time, registration is what makes it possible to align the scans to see what’s changed. 

“Registration is what allows us to integrate 3D data from different sources into a common coordinate system,” says Wang. “Without it, we wouldn’t actually be able to get as meaningful information from all these methods that have been developed.”

Solomon and Wang’s second paper demonstrates a new registration algorithm called “Deep Closest Point” (DCP) that was shown to better find a point cloud’s distinguishing patterns, points, and edges (known as “local features”) in order to align it with other point clouds. This is especially important for such tasks as enabling self-driving cars to situate themselves in a scene (“localization”), as well as for robotic hands to locate and grasp individual objects.

One limitation of DCP is that it assumes we can see an entire shape instead of just one side. This means it can’t handle the more difficult task of aligning partial views of shapes (known as “partial-to-partial registration”). As a result, in a third paper the researchers presented an improved algorithm for this task that they call the Partial Registration Network (PRNet). 

Solomon says that existing 3D data tends to be “quite messy and unstructured compared to 2D images and photographs.” His team sought to figure out how to get meaningful information out of all that disorganized 3D data without the controlled environment that a lot of machine learning technologies now require.

A key observation behind the success of DCP and PRNet is the idea that a critical aspect of point-cloud processing is context. The geometric features on point cloud A that suggest the best ways to align it to point cloud B may be different from the features needed to align it to point cloud C. For example, in partial registration, an interesting part of a shape in one point cloud may not be visible in the other — making it useless for registration.

Wang says that the team’s tools have already been deployed by many researchers in the computer vision community and beyond. Even physicists are using them for an application the CSAIL team had never considered: particle physics

Moving forward, the researchers hope to use the algorithms on real-world data, including data gathered from self-driving cars. Wang says they also plan to explore the potential of training their systems using self-supervised learning, to minimize the amount of human annotation needed.

Solomon and Wang were the two sole authors of the DCP and PRNet papers. Their co-authors on the EdgeConv paper were research assistant Yongbin Sun and Professor Sanjay Sarma of MIT, alongside postdoc Ziwei Liu of University of California at Berkeley and Professor Michael M. Bronstein of Imperial College London. 

The projects were supported, in part, by the U.S. Air Force, the U.S. Army Research Office, Amazon, Google Research, IBM, the National Science Foundation, the Skoltech-MIT Next Generation Program, and the Toyota Research Institute.

Related Links

Related Topics

Related Articles

More MIT News