A grand unified theory of AI

A new approach unites two prevailing but often opposed strains in the history of artificial-intelligence research.

In the 1950s and ’60s, artificial-intelligence researchers saw themselves as trying to uncover the rules of thought. But those rules turned out to be way more complicated than anyone had imagined. Since then, artificial-intelligence (AI) research has come to rely, instead, on probabilities — statistical patterns that computers can learn from large sets of training data.

The probabilistic approach has been responsible for most of the recent progress in artificial intelligence, such as voice recognition systems, or the system that recommends movies to Netflix subscribers. But Noah Goodman, an MIT research scientist whose department is Brain and Cognitive Sciences but whose lab is Computer Science and Artificial Intelligence, thinks that AI gave up too much when it gave up rules. By combining the old rule-based systems with insights from the new probabilistic systems, Goodman has found a way to model thought that could have broad implications for both AI and cognitive science.

Early AI researchers saw thinking as logical inference: if you know that birds can fly and are told that the waxwing is a bird, you can infer that waxwings can fly. One of AI’s first projects was the development of a mathematical language — much like a computer language — in which researchers could encode assertions like “birds can fly” and “waxwings are birds.” If the language was rigorous enough, computer algorithms would be able to comb through assertions written in it and calculate all the logically valid inferences. Once they’d developed such languages, AI researchers started using them to encode lots of commonsense assertions, which they stored in huge databases.

The problem with this approach is, roughly speaking, that not all birds can fly. And among birds that can’t fly, there’s a distinction between a robin in a cage and a robin with a broken wing, and another distinction between any kind of robin and a penguin. The mathematical languages that the early AI researchers developed were flexible enough to represent such conceptual distinctions, but writing down all the distinctions necessary for even the most rudimentary cognitive tasks proved much harder than anticipated.

Embracing uncertainty

In probabilistic AI, by contrast, a computer is fed lots of examples of something — like pictures of birds — and is left to infer, on its own, what those examples have in common. This approach works fairly well with concrete concepts like “bird,” but it has trouble with more abstract concepts — for example, flight, a capacity shared by birds, helicopters, kites and superheroes. You could show a probabilistic system lots of pictures of things in flight, but even if it figured out what they all had in common, it would be very likely to misidentify clouds, or the sun, or the antennas on top of buildings as instances of flight. And even flight is a concrete concept compared to, say, “grammar,” or “motherhood.”

As a research tool, Goodman has developed a computer programming language called Church — after the great American logician Alonzo Church — that, like the early AI languages, includes rules of inference. But those rules are probabilistic. Told that the cassowary is a bird, a program written in Church might conclude that cassowaries can probably fly. But if the program was then told that cassowaries can weigh almost 200 pounds, it might revise its initial probability estimate, concluding that, actually, cassowaries probably can’t fly.

“With probabilistic reasoning, you get all that structure for free,” Goodman says. A Church program that has never encountered a flightless bird might, initially, set the probability that any bird can fly at 99.99 percent. But as it learns more about cassowaries — and penguins, and caged and broken-winged robins — it revises its probabilities accordingly. Ultimately, the probabilities represent all the conceptual distinctions that early AI researchers would have had to code by hand. But the system learns those distinctions itself, over time — much the way humans learn new concepts and revise old ones.

“What’s brilliant about this is that it allows you to build a cognitive model in a fantastically much more straightforward and transparent way than you could do before,” says Nick Chater, a professor of cognitive and decision sciences at University College London. “You can imagine all the things that a human knows, and trying to list those would just be an endless task, and it might even be an infinite task. But the magic trick is saying, ‘No, no, just tell me a few things,’ and then the brain — or in this case the Church system, hopefully somewhat analogous to the way the mind does it — can churn out, using its probabilistic calculation, all the consequences and inferences. And also, when you give the system new information, it can figure out the consequences of that.”

Modeling minds

Programs that use probabilistic inference seem to be able to model a wider range of human cognitive capacities than traditional cognitive models can. At the 2008 conference of the Cognitive Science Society, for instance, Goodman and Charles Kemp, who was a PhD student in BCS at the time, presented work in which they’d given human subjects a list of seven or eight employees at a fictitious company and told them which employees sent e-mail to which others. Then they gave the subjects a short list of employees at another fictitious company. Without any additional data, the subjects were asked to create a chart depicting who sent e-mail to whom at the second company.

If the e-mail patterns in the sample case formed a chain — Alice sent mail to Bob who sent mail to Carol, all the way to, say, Henry — the human subjects were very likely to predict that the e-mail patterns in the test case would also form a chain. If the e-mail patterns in the sample case formed a loop — Alice sent mail to Bob who sent mail to Carol, and so on, but Henry sent mail to Alice — the subjects predicted a loop in the test case, too.

A program that used probabilistic inference, asked to perform the same task, behaved almost exactly like a human subject, inferring chains from chains and loops from loops. But conventional cognitive models predicted totally random e-mail patterns in the test case: they were unable to extract the higher-level concepts of loops and chains. With a range of collaborators in the Department of Brain and Cognitive Sciences, Goodman has conducted similar experiments in which subjects were asked to sort stylized drawings of bugs or trees into different categories, or to make inferences that required guessing what another person was thinking. In all these cases — several of which were also presented at the Cognitive Science Society’s conference — Church programs did a significantly better job of modeling human thought than traditional artificial-intelligence algorithms did.

Chater cautions that, while Church programs perform well on such targeted tasks, they’re currently too computationally intensive to serve as general-purpose mind simulators. “It’s a serious issue if you’re going to wheel it out to solve every problem under the sun,” Chater says. “But it’s just been built, and these things are always very poorly optimized when they’ve just been built.” And Chater emphasizes that getting the system to work at all is an achievement in itself: “It’s the kind of thing that somebody might produce as a theoretical suggestion, and you’d think, ‘Wow, that’s fantastically clever, but I’m sure you’ll never make it run, really.’ And the miracle is that it does run, and it works.”

Topics: Computer science and technology, Computing, Probabilistic programming, Theory of mind, Brain and cognitive sciences


Seems to be nothing.

You might see the new cognitive math that I recently presented at ITTC in Madrid. It is quite simple, and beautifully powerful. It is a new TGS much more advanced than everything done till now. I’m sure you’ll enjoy and really improve your way of thinking about AI. Mine is not a miracle, it is old tools combined into new math.

This is a really brilliant idea, and as many brilliant ideas is conceptually simple. I'm wandering if somebody suggested this approach before.

Seems like a rehash of Probabilistic Logic Networks of Goertzel et al. a few years back and implemented within OpenCog http://www.opencog.org/

(or of Pei Wang's NARS)

Until you MIT guys realize how simple the AI problem is, you'll never solve it.

AI is simply pattern matching. There is nothing else to it. There are no mathematics behind it, or languages, or anything else.

Wow. So apparently, grep is AI?

Pardon me if I find your definition a little lacking.

Pattern matching at significant scales is anything but simple. I am sure you have a sure simple solution for P = NP too.

Mantra - please be sarcasm...please be sarcasm...

Didn't we have probabilistic expert systems 20 years ago?

Input is fixated equations of Environmental elements / raw stats.

Environmental Input(EI) = Static

Preponderance of though of computer sequential & Parallel indeterminate by time and counter by EI.

Thus even though it appears to be Cognitive, it is not.

Regenerative modeling and EI plus none recursion answering system. Appears to be more effective. Wisdom capturing acts as an offset of what the outcome is.

Problem is, the offset makes the process sequential and thus still, "STATIC".

Unless perfect is still being considered a Cognitive trait.

church? are you kidding me?

a dollar short and two hundred and forty seven years late...

How does church "decide" how to weight/vector the variables ??

Here's an idea: use some of that interesting GPU architecture, assign every concept a simplified symbolic shape and (when applicable )animation . Then run comparisons on those , Bayesian analysis would certainly be effective on 'core shapes' in terms of what to make heavy or light .

Is this an orthogonal view of fuzzy logic?

The proposal of combining logical rules with probabilistic methods, as far as I'm concerned, is not new. In any case, it is a step forward to put uncertainty in the way in which machine can think. But this approach cannot face the problem of capturing the cognitive dimensions of human intelligence. Schematic conceptual structures (see Langacker, Talmy, Lakoff, Johnsonn-Laird, etc.) are responsible of metaphorical thought, which underlies most of our common sense reasoning. How to deal with metaphorical language, in my opinion, is still a problem in the way to effective AI systems.

Yes, grep is a primitive form of AI.

It's hard to grasp, but all simple things are.

That's an implementation issue. What does the brain do? it takes an input (say, an image), converts it to signals, sends it to the brain, matches it against the stored images and the brain fires a response.

Do you want to do that electronically? make a pattern matching chip that can simultaneously match an image against thousands of other images. When there is a match, let the part of the chip that finds a good match (say, over 90%) fire a signal.

The thing with the brain is that it is a truly parallel machine, unlike computers.

Another aspect of the brain is that it stores experiences, not images or sound. An experience contains images, sound, smell, taste and the brain's own thoughts at that particular moment. The brain does pattern matching on experiences in order to find the most appropriate response.

The idea that intelligence is simply probabilistic recall of knowledge is far too simplistic.

Real intelligence is smart application of knowledge and extrapolation of knowledge to new situations.

Probability folds in -- but I believe there are several more important factors:

#1 What advantage do I gain by taking an action?

#2 How much energy do I expend to take an action?

#3 What penalty do I get by taking an action?

Knowledge without action is useless.

Many inputs are shunted-to-ground because they are not a threat to the current action. I started thinking about this years ago when I saw a cattle egret not 2 feet from the side of I-95, staring in a puddle with cars whizzing by him a few feet away at 70 MPH. He couldn't care less. The cars weren't a threat. The bird had figured out that the probability of getting hit by a car was worth getting something out of that puddle. Put him in another context and he'll probably fly away when a car approaches. He had learned to ignore his peripheral vision which was no doubt firing like crazy.

Intelligence takes context into account, extrapolation of exisiting knowledge, and feedback (the old "you learn from your mistakes").

Image processing needs to take into account "noise" which we all learn to ignore (this saves energy on the brain). That's why you "learn" to ignore your alarm clock, or train, or some other thing that's no threat to you. Also why moms are generally the first to wake up first when the baby cries.

I haven't figure out how to fold all this in. Perhaps Church is a component but I can already see it's too computationally intensive -- whereas we humans (and other beings) make every effort to minimize what our brain does (some to extreme :-)

It seems that in this whole debate the fact that the mind has evolved over evolutionary history always as part of a complete organism that had to interact with and survive in the real world, has been completely forgotten. In other words, minds have always been part of embodied systems. If we don't understand how the two relate, it's going to just algorithms.

This is where I'd love for Steven Pinker or Richard Dawkins to come in and lay down some eloquent analyses of human cognition at work. So many theories... So many holes in them...

Metaphorical language is still a problem with Human reasoning. No reason to believe it will be programmable. It must be developed via experiential understanding, i.e. memory.

We need machines that mimic human senses and multiprocessor networks that mimic human parallel computing, with some subsets specialized for the same reasons that humans have such recognized areas of the brain.

This takes massive resources in terms of human specialists and a collaborative strategy, rather than the current competitive (i.e. market) context.

It may not be possible in the west, but if enough people with similar values and the specific goal of a functioning AI (Turing Machine) were combined with the resources and motovation (CHINA) it could be done.

Saying it's going to be "just algorithms" is not a valid objection.

It may be that whatever relationship mind/body have can be expressed by algorithms. It could be that the way the various parts of the brain conspire to produce consciousness organize in this particular fashion is only a matter of historical contingency and there could be other ways to produce better designs of conscious beings. The human eye, for instance, works, but we can think of better designs to make more efficient - we do have a theory of optics, and an idea of what a good camera might look like.

However, you're exactly right, we don't yet have a comprehensive theory of consciousness. And until we do, what we can do is experiment with various approaches and see whether it helps us understand the problem. It will be more of a trial and error.

Interesting article (single vantage history of AI… sort of). Most of what has been done in AI is tinkering. This is still a field that openly and proudly asserts the Turing Test as a valid metric. "I feel sure I would know intelligence if I saw it!" (sound familiar)?

There are many interesting ideas in this set of comments. Though most of them seem to be making the same mistake… an anecdotal confusion leading to false classification strata.

Pattern is pattern. If one feels the need to build special computational models for low level vs. high level pattern, one really needs to rethink their strategy and base of understanding.

Pattern storage, and pattern matching are computationally intensive. Always will be. The only way around this is lossy compression. That means letting go of some details and holding on to others. That means developing a highly compressed algorithm for the compression of pattern towards saliency. That demands a general purpose saliency engine. And that means looking through disparate sets of pattern and finding meta-pattern and then re-compressing the original patterns in reference to the always evolving meta-saliency pattern.

The only thing one needs to remember about intelligence is that it is lossy, that lossy is good, that lossy compression must be done in layers that feed low level domain specific pattern into more and more general layers of meta pattern compression.

Do this in software or hardware and do it all day long as new experience bubbles up through the pattern compression hierarchy and you will have intelligence.

Does stochastic weighting have a place in this lossy compression hierarchy engine????? Sure! But it isn't the centerpiece.

MIT has had a history of overly top-down (tinker-ers) and overly bottom up (mathematics or nothing) approaches to problems. Mana awaits those who can see evolution in all systems and see evolution for the least energy engine it is.

Randall Reetz

How could you explain word "Interesting".

Why reading about AI more interesting to me than reading about food or weather?

The idea machine for this problem would be a Cray XMT .

It can have hundreds of processors all sharing a globally shared memory, with word level synchronization and remote memory reference latency hiding.

"Metaphorical language is still a problem with Human reasoning. No reason to believe it will be programmable."

As Michael Beck says below: "Intelligence takes context into account, extrapolation of existing knowledge, and feedback."

When metaphorical language is taken in context, many clues to its meaning can already be derived and coupled with existing knowledge. And if that isn't enough, we can ask for feedback to clarify its meaning. If no feedback is available, then the AI/NLP software can calculate the most probable interpretation.

"We need ... multiprocessor networks that mimic human parallel computing."

While the brain has an advantage with its massively parallel computing, it is possible that such advantage is more than offset by the many advantages of the digital computer, such as perfect memory, virtually unlimited memory storage, accuracy of calculations and analysis, the ability to work non-stop at peak performance 24/7/365, more efficient use of data, and more.

are falling into the same trap as people decades ago when they thought they were on the cusp of cracking it. Applicable AI is simple and is pretty much a function of computing power. Simulating a human mind however is FAR from simple and is more complex than our knowledge encompasses yet. Given time to learn more about our brains, increased computing power, which are increasing at a fairly predictable rate. An estimate on when AI will be able to simulate the human mind is probably accurately calculateable.

There were some medical diagnosis systems built in the '80s that trimmed the search space (and decided which question to ask next) based on statistics. It's like min/max searching a game space, where the min/max costs are calculated from large statistical samples instead of from, say, game-board analysis.

I think the weakness in these systems is a single symptom can indicate a huge number of diseases, and a given disease can manifest a huge number of symptoms in a bunch of ways. And patients often have many overlapping diseases, and patients don't communicate very well and can easily be lead to report almost any symptom.

So, although at first medical diagnosis seemed like a nice simple domain at first, it turned out to be a huge ball of hair. As any doctor can tell you.

Jurgen Schmidhuber has given this question quite a bit of thought. Regardless of what you think of his pet neural net project, I think you'll agree that his definition of "interesting" is a pretty good one:

A situation is interesting if you expect that, by paying attention to it, your ability to represent a large amount of information with a small number of rules will improve rapidly for a long period of time.

So, noise is boring because finding rules is hopeless. A steady tone is boring because the single rule is too quickly discovered. Music is interesting because it seems complex but holds the promise that you will come to understand the simple rules behind that complexity as you listen.


People who think that being able to simulate the workings of a brain is the goal of AI are confusing the goal with the method.

The *goal* of generalized AI is to match or surpass the (worthwhile) output of the brain given the same available input, and simulating the way the brain works is probably the LEAST desirable approach, given the problems associated with the functioning of the brain.

I don't think this article merits its billing as "a grand unified theory of AI." There's no grand new insight to help us understand what is meant by "intelligence" and whether or not computers can simulate it. Feeding data to an algorithm and seeing what patterns it can recognize and inferences it can draw is not a simulation of intelligence(whether you’re using probability or pattern matching or a combination of the two.) What is intelligence and can it be modeled by a computer? That's the question a grand unified theory must answer. I can't answer the question but I can offer a suggestion to define success. We will acheive AI when we can give a computer a problem to solve and it will ask us questions about what is known already. It will ask us for the data it needs, rather than us stuffing it with the data we have in order to see if anything significant comes out the other end.

How mechanical is the brain. Its a lot of specialized hardware for converting sound, or light or touch to electrical signals that can be forced through a biological pipe line ( like a graphics card for example ). The net effect is that we have a remarkable gift for processing input very fast. Its a distinct evolutionary advantage for evading predators and finding food.

But how logical and reasonable are we really? Run with it.. If electricity was gone and we had no gas or light or storage for food, no transport no communications... How feral would we become. Where would our logic be.

Understandably on an x86 architecture using the software available we can only approximate the perceived activity of the brain using mathematics/logic/probability and depending on how we mix these ingredients we get varied results. Sometimes we mix them according to success or failure tainted by the hardware we are testing on.

It could be the case that on the right hardware there is a very simple language that can quite easily describe "thinking".

We need better hardware to simulate organs.

We need hardware control algorithms to simulate organs behaviour and reaction.

We need feedback mechanisms.

We need something that is a better model of an animals body that we can play around with to see what bit constitutes thought and how we can improve it.( if possible )

Bottom line. No one really knows how it works, comments like "when will you learn" and "Its a case of" are no good because if you had concrete proof to back them up there would be no argument.

I don't see anything wrong in this article. I don't see anything wrong in the exploration of techniques and the sharing of results.

Back to the top