MIT News - Compression

The faster-than-fast Fourier transform

Larry Hardesty, MIT News Office — Wed, 18 Jan 2012 05:00:00 -0500

The Fourier transform is one of the most fundamental concepts in the information sciences. It’s a method for representing an irregular signal — such as the voltage fluctuations in the wire that connects an MP3 player to a loudspeaker — as a combination of pure frequencies. It’s universal in signal processing, but it can also be used to compress image and audio files, solve differential equations and price stock options, among other things.

The reason the Fourier transform is so prevalent is an algorithm called the fast Fourier transform (FFT), devised in the mid-1960s, which made it practical to calculate Fourier transforms on the fly. Ever since the FFT was proposed, however, people have wondered whether an even faster algorithm could be found.

At the Symposium on Discrete Algorithms (SODA) this week, a group of MIT researchers will present a new algorithm that, in a large range of practically important cases, improves on the fast Fourier transform. Under some circumstances, the improvement can be dramatic — a tenfold increase in speed. The new algorithm could be particularly useful for image compression, enabling, say, smartphones to wirelessly transmit large video files without draining their batteries or consuming their monthly bandwidth allotments.

Like the FFT, the new algorithm works on digital signals. A digital signal is just a series of numbers — discrete samples of an analog signal, such as the sound of a musical instrument. The FFT takes a digital signal containing a certain number of samples and expresses it as the weighted sum of an equivalent number of frequencies.

“Weighted” means that some of those frequencies count more toward the total than others. Indeed, many of the frequencies may have such low weights that they can be safely disregarded. That’s why the Fourier transform is useful for compression. An eight-by-eight block of pixels can be thought of as a 64-sample signal, and thus as the sum of 64 different frequencies. But as the researchers point out in their new paper, empirical studies show that on average, 57 of those frequencies can be discarded with minimal loss of image quality.

Heavyweight division

Signals whose Fourier transforms include a relatively small number of heavily weighted frequencies are called “sparse.” The new algorithm determines the weights of a signal’s most heavily weighted frequencies; the sparser the signal, the greater the speedup the algorithm provides. Indeed, if the signal is sparse enough, the algorithm can simply sample it randomly rather than reading it in its entirety.

“In nature, most of the normal signals are sparse,” says Dina Katabi, one of the developers of the new algorithm. Consider, for instance, a recording of a piece of chamber music: The composite signal consists of only a few instruments each playing only one note at a time. A recording, on the other hand, of all possible instruments each playing all possible notes at once wouldn’t be sparse — but neither would it be a signal that anyone cares about.

The new algorithm — which associate professor Katabi and professor Piotr Indyk, both of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), developed together with their students Eric Price and Haitham Hassanieh — relies on two key ideas. The first is to divide a signal into narrower slices of bandwidth, sized so that a slice will generally contain only one frequency with a heavy weight.

In signal processing, the basic tool for isolating particular frequencies is a filter. But filters tend to have blurry boundaries: One range of frequencies will pass through the filter more or less intact; frequencies just outside that range will be somewhat attenuated; frequencies outside that range will be attenuated still more; and so on, until you reach the frequencies that are filtered out almost perfectly.

If it so happens that the one frequency with a heavy weight is at the edge of the filter, however, it could end up so attenuated that it can’t be identified. So the researchers’ first contribution was to find a computationally efficient way to combine filters so that they overlap, ensuring that no frequencies inside the target range will be unduly attenuated, but that the boundaries between slices of spectrum are still fairly sharp.

Zeroing in

Once they’ve isolated a slice of spectrum, however, the researchers still have to identify the most heavily weighted frequency in that slice. In the SODA paper, they do this by repeatedly cutting the slice of spectrum into smaller pieces and keeping only those in which most of the signal power is concentrated. But in an as-yet-unpublished paper, they describe a much more efficient technique, which borrows a signal-processing strategy from 4G cellular networks. Frequencies are generally represented as up-and-down squiggles, but they can also be though of as oscillations; by sampling the same slice of bandwidth at different times, the researchers can determine where the dominant frequency is in its oscillatory cycle.

Two University of Michigan researchers — Anna Gilbert, a professor of mathematics, and Martin Strauss, an associate professor of mathematics and of electrical engineering and computer science — had previously proposed an algorithm that improved on the FFT for very sparse signals. “Some of the previous work, including my own with Anna Gilbert and so on, would improve upon the fast Fourier transform algorithm, but only if the sparsity k” — the number of heavily weighted frequencies — “was considerably smaller than the input size n,” Strauss says. The MIT researchers’ algorithm, however, “greatly expands the number of circumstances where one can beat the traditional FFT,” Strauss says. “Even if that number k is starting to get close to n — to all of them being important — this algorithm still gives some improvement over FFT.”

Putting the squeeze on data

Larry Hardesty, MIT News Office — Mon, 21 Dec 2009 05:01:00 -0500

Data compression is one of the fundamental research areas in computer science, letting information systems do more with less. It’s the reason the iPod nano can hold thousands of songs instead of hundreds, and it’s what keeps transmitted images from choking the Internet. If every digital file is a string of bits — zeroes and ones — then compression is a way to represent the same information with fewer bits.

Most compression techniques trade space for time: while the compressed file takes up less memory, it has to be decoded before its contents are intelligible. In applications where memory is in short supply but data needs constant updating, it can be prohibitively time consuming to keep decompressing a file, modifying it, and then recompressing it. As a result, such applications — monitoring Internet traffic, for instance, or looking for patterns in huge collections of scientific data — often use a type of compression called linear compression. With linear compression, a computer program can modify the data in a compressed file without first decoding it.

Last year, Associate Professor Piotr Indyk of MIT's Computer Science and Artificial Intelligence Laboratory and his graduate student Radu Berinde introduced two different versions of a new linear-compression algorithm that perform as well as any yet invented — and for some applications, better. Both versions of the algorithm, however, had limitations: under certain extreme conditions, they’d just stop working. But this fall, at the Allerton Conference on Communication, Control, and Computing hosted by the University of Illinois at Urbana-Champaign, Indyk and Berinde presented a new version of the algorithm that combined the advantages of its predecessors and overcame their drawbacks.

“The two previous algorithms each had their own faults,” says Deanna Needell, a postdoc at Stanford who helped develop one of the other leading linear-compression algorithms. “And this tends to be the case in many situations in this area, that if you have two algorithms, one is good at one thing and the other is good at another. And [the new algorithm] sort of merged the two benefits. It’s like, here’s this algorithm that does both of the good things.”

Some compression techniques, like the zip algorithm commonly used for Internet downloads, are what’s called “lossless”: when you unzip the zipped version of a file, you recover every bit of the original. Other compression techniques are “lossy”: the MP3 version of a song, for example, takes up about a tenth as much space as the CD version, but it irreversibly discards a lot of subtle audio data.

Linear compression is lossy: expanding the compressed file doesn’t give you all the data in the original. But for many applications, that doesn’t matter. Take, for instance, Internet traffic monitoring. Packets of data traveling over the Internet pass through a succession of special-purpose computers called routers; each router examines the packet’s ultimate destination and tells it where to go next. There’s no way a router could store information about all the packets that pass through it in the course of a day, but with linear compression, it can store an approximation. Decoding the data can still disclose what Indyk calls the “heavy hitters” — the sites that are sending and receiving the most packets — which is what most researchers are interested. In other applications, the heavy hitters might be the members of a large population whose blood tests positive for a disease, or the concentrations of particular molecules in a chemical sample.

Going the distance

According to Indyk, there are three principal criteria for evaluating the performance of a linear-compression algorithm. One is the degree of compression: how much smaller the compressed file is than the uncompressed data. The second is recovery time: how long it takes to decode the compressed data. (Indyk says that some of the early linear-compression algorithms would take “hours or even days” to reconstruct an image captured by a one-megapixel camera.) And since linear compression is lossy, the third is how accurately the algorithm can reconstruct the original file.

In the last seven years, Indyk says, the field has progressed to the point where linear-compression algorithms can perform well along any two of those three parameters at the expense of the third. Indyk and Berinde chose to trade some fidelity in reconstruction for efficient extraction and good compression. Indeed, Indyk and some of his other students have recently demonstrated that there’s a mathematical limit to how much space savings linear compression can afford — and his and Berinde’s algorithm reaches it.

The insight behind the MIT researchers’ algorithm is fairly technical, but Indyk tried to explain it in layman’s terms. If you take two very different files — strings of ones and zeroes — of similar size, “the difference between them has a geometric interpretation,” Indyk explains. That is, there’s a way to mathematically describe the difference between the files in terms of distance: one file can be thought of as being close to or far away from the other.

With linear compression, there’s generally a trade-off between how fast the compression algorithm is and how much of the original file can be recovered. Slower but more accurate algorithms tend to preserve the geometric distance between files: if two uncompressed files are far apart, the compressed versions will be, too. With faster algorithms, on the other hand, the compressed files tend to be much closer to each other than the source files were.

Indyk and Berinde found a way to analyze the difference between compressed files using a different mathematical notion of geometric distance; under that analysis, some fast compression algorithms still preserve the distance between files. By taking advantage of this new perspective, the researchers were able to devise a decompression algorithm that recovers much more information from the original file without sacrificing any speed.

Work like Indyk and Berinde’s holds out the hope that soon, linear-compression algorithms will no longer need to sacrifice performance along one of the three parameters that Indyk mentioned — compression, time and accuracy. “I think we’re actually pretty close to that,” says Needell. “We’re pretty much to the end: we’re almost there.” She adds, however, that “there’s plenty of other directions that the field will go in.” For instance, she says, many of the mathematical techniques that linear-compression algorithms rely on could be adapted to improve the “recommendation engines” on web sites like Netflix or Amazon, which try to predict which books or movies a customer might like on the basis of prior history.

Explained: The Discrete Fourier Transform

Larry Hardesty, MIT News Office — Wed, 25 Nov 2009 05:00:00 -0500

Science and technology journalists pride themselves on the ability to explain complicated ideas in accessible ways, but there are some technical principles that we encounter so often in our reporting that paraphrasing them or writing around them begins to feel like missing a big part of the story. So in a new series of articles called "Explained," MIT News Office staff will explain some of the core ideas in the areas they cover, as reference points for future reporting on MIT research.

In 1811, Joseph Fourier, the 43-year-old prefect of the French district of Isère, entered a competition in heat research sponsored by the French Academy of Sciences. The paper he submitted described a novel analytical technique that we today call the Fourier transform, and it won the competition; but the prize jury declined to publish it, criticizing the sloppiness of Fourier’s reasoning. According to Jean-Pierre Kahane, a French mathematician and current member of the academy, as late as the early 1970s, Fourier’s name still didn’t turn up in the major French encyclopedia the Encyclopædia Universalis.

Now, however, his name is everywhere. The Fourier transform is a way to decompose a signal into its constituent frequencies, and versions of it are used to generate and filter cell-phone and Wi-Fi transmissions, to compress audio, image, and video files so that they take up less bandwidth, and to solve differential equations, among other things. It’s so ubiquitous that “you don’t really study the Fourier transform for what it is,” says Laurent Demanet, an assistant professor of applied mathematics at MIT. “You take a class in signal processing, and there it is. You don’t have any choice.”

The Fourier transform comes in three varieties: the plain old Fourier transform, the Fourier series, and the discrete Fourier transform. But it’s the discrete Fourier transform, or DFT, that accounts for the Fourier revival. In 1965, the computer scientists James Cooley and John Tukey described an algorithm called the fast Fourier transform, which made it much easier to calculate DFTs on a computer. All of a sudden, the DFT became a practical way to process digital signals.

To get a sense of what the DFT does, consider an MP3 player plugged into a loudspeaker. The MP3 player sends the speaker audio information as fluctuations in the voltage of an electrical signal. Those fluctuations cause the speaker drum to vibrate, which in turn causes air particles to move, producing sound.

An audio signal’s fluctuations over time can be depicted as a graph: the x-axis is time, and the y-axis is the voltage of the electrical signal, or perhaps the movement of the speaker drum or air particles. Either way, the signal ends up looking like an erratic wavelike squiggle. But when you listen to the sound produced from that squiggle, you can clearly distinguish all the instruments in a symphony orchestra, playing discrete notes at the same time.

That’s because the erratic squiggle is, effectively, the sum of a number of much more regular squiggles, which represent different frequencies of sound. “Frequency” just means the rate at which air molecules go back and forth, or a voltage fluctuates, and it can be represented as the rate at which a regular squiggle goes up and down. When you add two frequencies together, the resulting squiggle goes up where both the component frequencies go up, goes down where they both go down, and does something in between where they’re going in different directions.

The DFT does mathematically what the human ear does physically: decompose a signal into its component frequencies. Unlike the analog signal from, say, a record player, the digital signal from an MP3 player is just a series of numbers, each representing a point on a squiggle. Collect enough such points, and you produce a reasonable facsimile of a continuous signal: CD-quality digital audio recording, for instance, collects 44,100 samples a second. If you extract some number of consecutive values from a digital signal — 8, or 128, or 1,000 — the DFT represents them as the weighted sum of an equivalent number of frequencies. (“Weighted” just means that some of the frequencies count more than others toward the total.)

The application of the DFT to wireless technologies is fairly straightforward: the ability to break a signal into its constituent frequencies lets cell-phone towers, for instance, disentangle transmissions from different users, allowing more of them to share the air.

The application to data compression is less intuitive. But if you extract an eight-by-eight block of pixels from an image, each row or column is simply a sequence of eight numbers — like a digital signal with eight samples. The whole block can thus be represented as the weighted sum of 64 frequencies. If there’s little variation in color across the block, the weights of most of those frequencies will be zero or near zero. Throwing out the frequencies with low weights allows the block to be represented with fewer bits but little loss of fidelity.

Demanet points out that the DFT has plenty of other applications, in areas like spectroscopy, magnetic resonance imaging, and quantum computing. But ultimately, he says, “It’s hard to explain what sort of impact Fourier’s had,” because the Fourier transform is such a fundamental concept that by now, “it’s part of the language.”