MIT News - Linear algebra
https://news.mit.edu/topic/mitlinear-algebra-rss.xml
MIT news feed about: Linear algebraenWed, 08 May 2019 12:30:01 -0400Gil Strang is still going strong, online and in print
https://news.mit.edu/2019/gil-strang-still-going-strong-online-and-print-0508
After nearly 60 years of teaching at MIT, this math professor surpasses 10 million views on OCW, earns top reviews for his teaching style, and publishes his 12th book. Wed, 08 May 2019 12:30:01 -0400https://news.mit.edu/2019/gil-strang-still-going-strong-online-and-print-0508Sandi Miller | Department of Mathematics<p>MIT’s class 18.06 (<a href="https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/">Linear Algebra</a>) has surpassed 10 million views on <a href="https://ocw.mit.edu/index.htm">OpenCourseWare</a> (OCW). That’s the kind of math that makes Professor <a href="http://math.mit.edu/directory/profile.php?pid=266">Gilbert Strang</a> one of the most recognized mathematicians in the world.</p>
<p>“That was a surprise to me,” says Strang. But not to those at OCW.</p>
<p>“He is a favorite; there is no way around it,” says OCW Director Curt Newton. Each month, OCW publishes a list of its most-visited courses, and Newton points out that Strang’s course has always been among the top 10 most-viewed since OCW launched. “He cracked the 10 million number,” he says. “It’s clear that Gil’s teaching has struck just the right chord with learners and educators around the world.”</p>
<p>Strang’s 18.06 lectures, posted between 2002-2011, also have more than 3.1 million YouTube views from math students in places like India, China, and Africa, among others. “His lectures are just excellent,” explains math Professor Haynes Miller. To illustrate the video’s massive popularity, Miller recounts a conversation, at the online Electronic Seminar on Mathematics Education, about revising a linear algebra course at the University of Illinois. “In the new version, they do almost no lecturing ... and one reason they feel that they can get away with that is that they can send students to Gil’s lectures on OCW.”</p><p><strong>A linear path to MIT</strong></p><p>Strang, the MathWorks Professor of Mathematics, received his BS from MIT in 1955. After earning Rhodes Scholarship to Oxford University and a PhD from the University of California at Los Angeles in 1959, he returned to MIT to teach.</p>
<p>Strang began teaching linear algebra in the 1970s, during a time when engineers and scientists wrote large software packages using the finite element method to solve structural problems, computing forces and stresses in solid and fluid mechanics. Strang recalls his “Aha!” moment when he thought about the finite element method of solving partial differential equations using simple trial functions. With scientists generating a huge amount of data, from magnetic resonance scans producing millions of images to microarrays of entire genomes, the goal was to find structure and language to make sense of it all.</p>
<p>Once Strang realized that the tools of linear algebra were related to everything from pure math to the internet, he decided to change the way the subject was taught. The 18.06 class soon became popular with science and engineering students, at MIT and around the world. Now in its fifth edition, Strang’s textbook "Introduction to Linear Algebra" has been translated into French, German, Greek, Japanese, and Portuguese. More than 40 years later, about a third of MIT students take this course.</p><p>“I’m not teaching the math guys who jump over linear algebra,” he says. “18.06 is specifically for engineering and science and economics and management.”</p>
<p>Certainly one of the secrets to his OCW success is his teaching style. Strang has a quick smile and an encouraging manner. In his class, he says “please” and “thank you.” To gauge whether students are keeping up, he asks, “Am I OK?” or adds explanations and recaps. He strives for an interactive class by asking questions, and gives intuitions and pictures before presenting a formal proof. And the students seem delighted to see beautiful results emerge from seemingly simple constructions.</p>
<p>After a lifetime of teaching at MIT, he is still able to project energy and enthusiasm over his subject. In short, he’s a natural for video.</p>
<p>“My original motive for doing this was to encourage other faculty to do it, and maybe show them a new way to teach linear algebra,” he says. His first set of lectures was recorded in 1999 with support from the Lord Foundation of Massachusetts. The videos don’t feature fancy graphics or music, but are an homage to the power of old-school lectures with a chalkboard by a master teacher.</p>
<p>The most popular of Strang’s multiple 18.06 OCW versions is the enhanced <a href="https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/">18.06SC “OCW Scholar”</a> version, published in 2011. It adds problem-solving videos by grad students and postdocs patiently explaining a complex subject to a grateful audience, very much in the spirit of Strang’s lectures.</p>
<p>“This lecture series is one of the few that I like to watch for fun,” says one commenter. Adds another, “This teacher would be fun to sit down with and have a cup of coffee and conversation.” And a high school teacher says, “He is clear, interesting, and nonthreatening. I watch his linear algebra lessons and wish I could tell him how terrific he is.”</p><p><strong>A new book</strong></p>
<p>OCW <a href="https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/" target="_blank">recently posted</a> 34 videos, along with an introduction, to his relatively new class <a href="http://math.mit.edu/classes/18.065/2019SP/">18.065</a> (Matrix Methods in Data Analysis, Signal Processing, and Machine Learning.) To accompany the class, Strang recently released "<a href="http://math.mit.edu/~gs/learningfromdata/">Linear Algebra and Learning from Data</a>," his 12th textbook.</p>
<p>Strang is known for his clear yet lively writing, and early reviews confirm that this new book continues his style. Even the book’s cover is evocative. He chose a photo his son Robert took, on Inle Lake in Myanmar, of a man on a boat holding a fishing net encased in a bamboo cage. The man is lifting up what Strang says resembles a neural net.</p>
<p>The class was a chance for Strang to expand his linear algebra teachings into the area of deep learning. This class debuted in 2017 when Professor Raj Rao Nadakuditi of the University of Michigan spent his sabbatical teaching 18.065 at MIT. For the class, professor of applied mathematics <a href="http://math.mit.edu/directory/profile?pid=63">Alan Edelman</a> introduced the powerful language <a href="http://news.mit.edu/2018/mit-developed-julia-programming-language-debuts-juliacon-0827">Julia</a>, while Strang explained the four fundamental subspaces and the Singular Value Decomposition.</p>
<p>“This was linear algebra for signals and data, and it was alive,” says Strang. “More important, this was the student response, too.”</p>
<p>Last spring, he started assembling the handouts and online materials into a book. Now in its third year, the class is held in 2-190 and is filled to capacity. In the class and book, Strang starts with linear algebra and moves to optimization by gradient descent, and then to the structure and analysis of deep learning. His goal is to organize central methods and ideas of data science, and to show how the language of linear algebra expresses those ideas.</p>
<p>“The new textbook is just the beginning, as the course invites students to ask their own questions and write their own programs. Exams are outlawed. A key point of the course is that it ends with a project from each student — and those projects are wonderful.”</p>
<p>His students agree.</p>
<p>“Professor Strang structures the class so that ideas seem to flow from the students into proofs,” says senior and math major Jesse Michel. “There’s a nice balance between proofs and examples, so that you know the approaches work in general, while never losing sight of practice. Every class includes a cool math trick or joke that keeps the class laughing. Professor Strang’s energy and emphasis on the exciting points keeps the class on the edge of their seats.”</p>
<p><strong>Open means open</strong></p>
<p>Haynes Miller says that all MIT faculty are invited to contribute courses to OCW. There are about 2,450 courses on OCW currently, with over 100 having complete video lectures, and more going up as fast as OCW can post them.</p>
<p>“OCW began under foundation grants, but is now supported by the provost here at MIT, corporate sponsors, and user donations,” says Miller. “I feel that MIT faculty are extremely lucky to have OpenCourseWare as a publication venue for courseware we design.”</p>
Gil Strang teaches 18.06 (Matrix Methods in Data Analysis, Signal Processing, and Machine Learning). OCW will soon post 34 videos to 18.065, and he recently released "Linear Algebra and Learning from Data," his 12th textbook.Photo: Sandi MillerExplained: Matrices
https://news.mit.edu/2013/explained-matrices-1206
Concepts familiar from grade-school algebra have broad ramifications in computer science.Fri, 06 Dec 2013 05:00:00 -0500https://news.mit.edu/2013/explained-matrices-1206Larry Hardesty, MIT News OfficeAmong the most common tools in electrical engineering and computer science are rectangular grids of numbers known as matrices. The numbers in a matrix can represent data, and they can also represent mathematical equations. In many time-sensitive engineering applications, multiplying matrices can give quick but good approximations of much more complicated calculations.<br /><br />Matrices arose originally as a way to describe systems of linear equations, a type of problem familiar to anyone who took grade-school algebra. “<a href="/newsoffice/2010/explained-linear-0226.html" target="_self">Linear</a>” just means that the variables in the equations don’t have any exponents, so their graphs will always be straight lines.<br /><br />The equation x - 2y = 0, for instance, has an infinite number of solutions for both x and y, which can be depicted as a straight line that passes through the points (0,0), (2,1), (4,2), and so on. But if you combine it with the equation x - y = 1, then there’s only one solution: x = 2 and y = 1. The point (2,1) is also where the graphs of the two equations intersect.<br /><br />The matrix that depicts those two equations would be a two-by-two grid of numbers: The top row would be [1 -2], and the bottom row would be [1 -1], to correspond to the coefficients of the variables in the two equations.<br /><br />In a range of applications from image processing to genetic analysis, computers are often called upon to solve systems of linear equations — usually with many more than two variables. Even more frequently, they’re called upon to multiply matrices.<br /><br />Matrix multiplication can be thought of as solving linear equations for particular variables. Suppose, for instance, that the expressions t + 2p + 3h; 4t + 5p + 6h; and 7t + 8p + 9h describe three different mathematical operations involving temperature, pressure, and humidity measurements. They could be represented as a matrix with three rows: [1 2 3], [4 5 6], and [7 8 9].<br /><br />Now suppose that, at two different times, you take temperature, pressure, and humidity readings outside your home. Those readings could be represented as a matrix as well, with the first set of readings in one column and the second in the other. Multiplying these matrices together means matching up rows from the first matrix — the one describing the equations — and columns from the second — the one representing the measurements — multiplying the corresponding terms, adding them all up, and entering the results in a new matrix. The numbers in the final matrix might, for instance, predict the trajectory of a low-pressure system.<br /><br />Of course, reducing the complex dynamics of weather-system models to a system of linear equations is itself a difficult task. But that points to one of the reasons that matrices are so common in computer science: They allow computers to, in effect, do a lot of the computational heavy lifting in advance. Creating a matrix that yields useful computational results may be difficult, but performing matrix multiplication generally isn’t.<br /><br />One of the areas of computer science in which matrix multiplication is particularly useful is graphics, since a digital image is basically a matrix to begin with: The rows and columns of the matrix correspond to rows and columns of pixels, and the numerical entries correspond to the pixels’ color values. Decoding digital video, for instance, requires matrix multiplication; earlier this year, MIT researchers were able to build one of the <a href="/newsoffice/2013/mit-researchers-build-quad-hd-tv-chip-0220.html" target="_self">first chips </a>to implement the new high-efficiency video-coding standard for ultrahigh-definition TVs, in part because of patterns they discerned in the matrices it employs. <br /><br />In the same way that matrix multiplication can help process digital video, it can help process digital sound. A digital audio signal is basically a sequence of numbers, representing the variation over time of the air pressure of an acoustic audio signal. Many techniques for filtering or compressing digital audio signals, such as the <a href="/newsoffice/2012/faster-fourier-transforms-0118.html" target="_self">Fourier transform</a>, rely on matrix multiplication.<br /><br />Another reason that matrices are so useful in computer science is that <a href="/newsoffice/2012/explained-graphs-computer-science-1217.html" target="_self">graphs</a> are. In this context, a graph is a mathematical construct consisting of nodes, usually depicted as circles, and edges, usually depicted as lines between them. Network diagrams and family trees are familiar examples of graphs, but in computer science they’re used to represent everything from <a href="/newsoffice/2012/making-web-applications-more-efficient-0831.html" target="_self">operations performed</a> during the execution of a computer program to the relationships characteristic of <a href="/newsoffice/2013/algorithm-extends-artificial-intelligence-technique-1114.html" target="_self">logistics problems</a>.<br /><br />Every graph can be represented as a matrix, however, where each column and each row represents a node, and the value at their intersection represents the strength of the connection between them (which might frequently be zero). Often, the most efficient way to analyze graphs is to convert them to matrices first, and the solutions to problems involving graphs are frequently solutions to systems of linear equations.A matrix multiplication diagram. Short algorithm, long-range consequences
https://news.mit.edu/2013/short-algorithm-long-range-consequences-0301
A new technique for solving ‘graph Laplacians’ is drastically simpler than its predecessors, with implications for a huge range of practical problems.Fri, 01 Mar 2013 15:00:03 -0500https://news.mit.edu/2013/short-algorithm-long-range-consequences-0301Larry Hardesty, MIT News Office<p>In the last decade, theoretical computer science has seen remarkable progress on the problem of solving graph Laplacians — the esoteric name for a calculation with hordes of familiar applications in scheduling, image processing, online product recommendation, network analysis, and scientific computing, to name just a few. Only in 2004 did researchers first propose an algorithm that solved graph Laplacians in “nearly linear time,” meaning that the algorithm’s running time didn’t increase exponentially with the size of the problem.<br />
</p>
<div class="video_captions" style="width: 368px; float: right; margin: 0 0 10px 10px;"><img src="/sites/default/files/images/inline/images/2013/laplacians.gif" style="border-width: 0px; border-style: solid; width: 368px;" /> <span class="image_caption">This animation shows two different "spanning trees" for a simple graph, a grid like those used in much scientific computing. The speedups promised by a new MIT algorithm require "low-stretch" spanning trees (green), in which the paths between neighboring nodes don't become excessively long (red).</span> <span class="image_credit">Images courtesy of the researchers</span></div>
<p>At this year’s ACM Symposium on the Theory of Computing, MIT researchers will present <a href="http://arxiv.org/pdf/1301.6628" target="_blank">a new algorithm</a> for solving graph Laplacians that is not only faster than its predecessors, but also drastically simpler. “The 2004 paper required fundamental innovations in multiple branches of mathematics and computer science, but it ended up being split into three papers that I think were 130 pages in aggregate,” says Jonathan Kelner, an associate professor of applied mathematics at MIT who led the new research. “We were able to replace it with something that would fit on a blackboard.”<br />
<br />
The MIT researchers — Kelner; Lorenzo Orecchia, an instructor in applied mathematics; and Kelner’s students Aaron Sidford and Zeyuan Zhu — believe that the simplicity of their algorithm should make it both faster and easier to implement in software than its predecessors. But just as important is the simplicity of their conceptual analysis, which, they argue, should make their result much easier to generalize to other contexts.<br />
<br />
<strong>Overcoming resistance</strong><br />
<br />
A graph Laplacian is a matrix — a big grid of numbers — that describes a <a href="/newsoffice/2012/explained-graphs-computer-science-1217.html" target="_blank">graph</a>, a mathematical abstraction common in computer science. A graph is any collection of nodes, usually depicted as circles, and edges, depicted as lines that connect the nodes. In a logistics problem, the nodes might represent tasks to be performed, while in an online recommendation engine, they might represent titles of movies.<br />
<br />
In many graphs, the edges are “weighted,” meaning that they have different numbers associated with them. Those numbers could represent the cost — in time, money or energy — of moving from one step to another in a complex logistical operation, or they could represent the strength of the correlations between the movie preferences of customers of an online video service.<br />
<br />
The Laplacian of a graph describes the weights between all the edges, but it can also be interpreted as a series of linear equations. Solving those equations is crucial to many techniques for analyzing graphs.<br />
<br />
One intuitive way to think about graph Laplacians is to imagine the graph as a big electrical circuit and the edges as resistors. The weights of the edges describe the resistance of the resistors; solving the Laplacian tells you how much current would flow between any two points in the graph.<br />
<br />
Earlier approaches to solving graph Laplacians considered a series of ever-simpler approximations of the graph of interest. Solving the simplest provided a good approximation of the next simplest, which provided a good approximation of the next simplest, and so on. But the rules for constructing the sequence of graphs could get very complex, and proving that the solution of the simplest was a good approximation of the most complex required considerable mathematical ingenuity.<br />
<br />
<strong>Looping back</strong><br />
<br />
The MIT researchers’ approach is much more straightforward. The first thing they do is find a “spanning tree” for the graph. A tree is a particular kind of graph that has no closed loops. A family tree is a familiar example; there, a loop might mean that someone was both parent and sibling to the same person. A spanning tree of a graph is a tree that touches all of the graph’s nodes but dispenses with the edges that create loops. Efficient algorithms for constructing spanning trees are well established.<br />
<br />
The spanning tree in hand, the MIT algorithm then adds back just one of the missing edges, creating a loop. A loop means that two nodes are connected by two different paths; on the circuit analogy, the voltage would have to be the same across both paths. So the algorithm sticks in values for current flow that balance the loop. Then it adds back another missing edge and rebalances.<br />
<br />
In even a simple graph, values that balance one loop could imbalance another one. But the MIT researchers showed that, remarkably, this simple, repetitive process of adding edges and rebalancing will converge on the solution of the graph Laplacian. Nor did the demonstration of that convergence require sophisticated mathematics: “Once you find the right way of thinking about the problem, everything just falls into place,” Kelner explains.<br />
<br />
<strong>Paradigm shift</strong><br />
<br />
Daniel Spielman, a professor of applied mathematics and computer science at Yale University, was Kelner’s thesis advisor and one of two co-authors of the 2004 paper. According to Spielman, his algorithm solved Laplacians in nearly linear time “on problems of astronomical size that you will never ever encounter unless it’s a much bigger universe than we know. Jon and colleagues’ algorithm is actually a practical one.”<br />
<br />
Spielman points out that in 2010, researchers at Carnegie Mellon University also presented a practical algorithm for solving Laplacians. Theoretical analysis shows that the MIT algorithm should be somewhat faster, but “the strange reality of all these things is, you do a lot of analysis to make sure that everything works, but you sometimes get unusually lucky, or unusually unlucky, when you implement them. So we’ll have to wait to see which really is the case.”<br />
<br />
The real value of the MIT paper, Spielman says, is in its innovative theoretical approach. “My work and the work of the folks at Carnegie Mellon, we’re solving a problem in numeric linear algebra using techniques from the field of numerical linear algebra,” he says. “Jon’s paper is completely ignoring all of those techniques and really solving this problem using ideas from data structures and algorithm design. It’s substituting one whole set of ideas for another set of ideas, and I think that’s going to be a bit of a game-changer for the field. Because people will see there’s this set of ideas out there that might have application no one had ever imagined.”</p>
Image courtesy of the researchersUnraveling the Matrix
https://news.mit.edu/2010/faster-fourier-0729
A new way of analyzing grids of numbers known as matrices could improve signal-processing applications and data-compression schemes.Thu, 29 Jul 2010 04:00:00 -0400https://news.mit.edu/2010/faster-fourier-0729Larry Hardesty, MIT News OfficeAmong the most common tools in electrical engineering and computer science are rectangular grids of numbers known as matrices. The numbers in a matrix can represent data: The rows, for instance, could represent temperature, air pressure and humidity, and the columns could represent different locations where those three measurements were taken. But matrices can also represent mathematical equations. If the expressions t + 2p + 3h and 4t + 5p + 6h described two different mathematical operations involving temperature, pressure and humidity measurements, they could be represented as a matrix with two rows, [1 2 3] and [4 5 6]. Multiplying the two matrices together means performing both mathematical operations on every column of the data matrix and entering the results in a new matrix. In many time-sensitive engineering applications, multiplying matrices can give quick but good approximations of much more complicated calculations. <br /><br />In a paper <a href="http://www.pnas.org/content/107/28/12413.abstract" target="_blank">published in the July 13 issue</a> of <em>Proceedings of the National Academy of Science</em>, MIT math professor Gilbert Strang describes a new way to split certain types of matrices into simpler matrices. The result could have implications for software that processes video or audio data, for compression software that squeezes down digital files so that they take up less space, or even for systems that control mechanical devices.<br /><br />Strang’s analysis applies to so-called banded matrices. Most of the numbers in a banded matrix are zeroes; the only exceptions fall along diagonal bands, at or near the central diagonal of the matrix. This may sound like an esoteric property, but it often has practical implications. Some applications that process video or audio signals, for instance, use banded matrices in which each band represents a different time slice of the signal. By analyzing local properties of the signal, the application could, for instance, sharpen frames of video, or look for redundant information that can be removed to save memory or bandwidth.<br /><br /><strong>Working backwards</strong><br /><br />Since most of the entries in a banded matrix — maybe 99 percent, Strang says — are zero, multiplying it by another matrix is a very efficient procedure: You can ignore all the zero entries. After a signal has been processed, however, it has to be converted back into its original form. That requires multiplying it by the “inverse” of the processing matrix: If multiplying matrix A by matrix B yields matrix C, multiplying C by the inverse of B yields A. <br /><br />But the fact that a matrix is banded doesn’t mean that its inverse is. In fact, Strang says, the inverse of a banded matrix is almost always “full,” meaning that almost all of its entries are nonzero. In a signal-processing application, all the speed advantages offered by banded matrices would be lost if restoring the signal required multiplying it by a full matrix. So engineers are interested in banded matrices with banded inverses, but which matrices those are is by no means obvious. <br /><br />In his <em>PNAS</em> paper, Strang describes a new technique for breaking a banded matrix up into simpler matrices — matrices with fewer bands. It’s easy to tell whether these simpler matrices have banded inverses, and if they do, their combination will, too. Strang’s technique thus allows engineers to determine whether some promising new signal-processing techniques will, in fact, be practical.<br /><br /><strong>Faster than Fourier?</strong><br /><br />One of the most common digital-signal-processing techniques is the <a href="/newsoffice/2009/explained-fourier.html">discrete Fourier transform (DFT)</a>, which breaks a signal into its component frequencies and can be represented as a matrix. Although the matrix for the Fourier transform is full, Strang says, “the great fact about the Fourier transform is that it happens to be possible, even though it’s full, to multiply fast and to invert it fast. That’s part of what makes Fourier wonderful.” Nonetheless, for some signal-processing applications, banded matrices could prove more efficient than the Fourier transform. If only parts of the signal are interesting, the bands provide a way to home in on them and ignore the rest. “Fourier transform looks at the whole signal at once,” Strang says. “And that’s not always great, because often the signal is boring for 99 percent of the time.” <br /><br />Richard Brualdi, the emeritus UWF Beckwith Bascom Professor of Mathematics at the University of Wisconsin-Madison, points out that a mathematical conjecture that Strang presents in the paper has already been proven by three other groups of researchers. “It’s a very interesting theorem,” says Brualdi. “It’s already generated a couple of papers, and it’ll probably generate some more.” Brualdi points out that large data sets, such as those generated by gene sequencing, medical imaging, or weather monitoring, often yield matrices with regular structures. Bandedness is one type of structure, but there are others, and Brualdi expects other mathematicians to apply techniques like Strang’s to other types of structured matrices. “Whether or not those things will work, I really don’t know,” Brualdi says. “But Gil’s already said that he’s going to look at a different structure in a future paper.”<br /><br />In a banded matrix, all the nonzero entries cluster around the diagonal.Graphic: Christine Daniloff