The Computational Theory of Mind

Week 2 Reading for Principles of Intelligent Systems

From:

Pinker, Steven. (1997). How the Mind Works. Penguin Books, Australia. pp. 64-77.

[p64]

Thinking Machines

The traditional explanation of intelligence is that human flesh is suffused with a non‑material entity, the soul, usually envisioned as some kind of ghost or spirit. But the theory faces an insurmountable problem: How does the spook interact with solid matter? How does an ethereal nothing respond to flashes, pokes, and beeps and get arms and legs to move? Another problem is the overwhelming evidence that the mind is the activity of the brain. The supposedly immaterial soul, we now know, can be bisected with a knife, altered by chemicals, started or stopped by electricity, and extinguished by a sharp blow or by insufficient oxygen. Under a microscope, the brain has a breathtaking complexity of physicical structure fully commensurate with the richness of the mind.

Another explanation is that mind comes from some extraordinary form of matter. Pinocchio was animated by a magical kind of wood found by Geppetto that talked, laughed, and moved on its own. Alas, no one has ever discovered such a wonder substance. At first one might think that the wonder substance is brain tissue. Darwin wrote that the brain "secretes" the mind, and recently the philosopher John Searle has argued that the physico‑chemical properties of brain tissue somehow produce the mind just as breast tissue produces milk and plant tissue produces sugar. But recall that the same kinds of membranes, pores, and chemicals are found in brain tissue throughout the animal kingdom, not to mention in brain tumors and cultures in dishes. All of these globs of neural tissue have the same physico‑chemical properties, but not all of

[p65] them accomplish humanlike intelligence. Of course, something about the tissue of the human brain is necessary for our intelligence, but the physical properties are not sufficient, just as the physical properties of bricks are not sufficient to explain architecture and the physical properties of oxide particles are not sufficient to explain music. Something in the patterning of neural tissue is crucial.

InteIligence has often been attributed to some kind of energy flow or force field. Orbs, luminous vapors, auras, vibrations, magnetic fields, and lines of force figure prominently in spiritualism, pseudoscience, and science fiction kitsch. The school of Gestalt psychology tried to explain illusions in terms of electromagnetic force fields on the surface of the brain, but the fields were never found. Occasionally the brain surface has been described as a continuous vibrating medium that supports holograms or other wave interference patterns, but that idea, too, has not panned out. The hydraulic model, with its psychic pressure building up, bursting out, or being diverted through alternative channels, lay at the center of Freud's theory and can be found in dozens of everyday metaphors: anger welling up, letting off steam, exploding under the pressure, blowing one's stack, venting one's feelings, bottling up rage. But even it the hottest emotions do not literally correspond to a buildup and discharge of energy, (in the physicist's sense) somewhere in the, brain. In Chapter 6 I will try to persuade you that the brain does not actually operate by internal pressures but contrives them as a negotiating tactic, like a terrorist with explosives strapped to his body.

A problem with all these ideas is that even if we did discover some gel or vortex or vibration or orb that spoke and plotted mischief like Geppetto's log, or that, more generally, made decisions based on rational and pursued a goal in the face of obstacles, we would still be faced with the mystery of how it accomplished those feats.

No, intelligence does not come from a special kind of spirit or matter or energy but from a different commodity, information. Information is a correlation between two things that is produced by a lawful process (as to coming about by sheer chance). We say that the rings in a stump carry information about the age of the tree because their number correlates with the tree's age (the older the tree, the more rings it has), and the correlation is not a coincidence but is caused by the way, trees grow. Correlation is a mathematical and logical concept; it is not defined In terms of the stuff that the correlated entities are made of.

Information itself is nothing special; it is found leave

[p66] effects. What is special is information processing. We can regard a piece of matter that carries information about some state affairs as a symbol; it call "stand for" that state of affairs. But as a piece of matter, it can do other things as well ‑ physical things, whatever that kind of matter in that kind of state can do according to the laws of physics and chemistry. Tree rings carry information about age, but they also reflect light and absorb staining material. Footprints carry information about animal motions, but they also trap water and cause eddies in the wind.

Now here is an idea. Suppose one were to build a machine with parts that are affected by the physical properties of some symbol. Some lever or electric eye or tripwire or magnet is set in motion by the pigment absorbed by a tree ring, or the water trapped by a footprint, or the light reflected by a chalk mark, or the magnetic charge in a bit of oxide. And suppose that the machine then causes something to happen in some other pile of matter. It burns new marks onto a piece of wood, or stamps impressions into nearby dirt, or charges some other bit of oxide. Nothing special has happened so far; all I have described is a chain of physical events accomplished by a pointless contraption.

Here is the special step. Imagine that we now try to interpret the newly arranged piece of matter using the scheme according to which the original piece carried information. Say we count the newly burned wood rings and interpret them as the age of some tree at some time, even though they were not caused by the growth of any tree. And let's say that the machine was carefully designed so that the interpretation of its new markings made sense ‑ that is, so that they carried information about something in the world. For example, imagine a machine that scans the rings in a stump, burns one mark on a nearby plank for each ring, moves over to a smaller stump from a tree that was cut down at the same time, scans its rings, and sands off one mark in the plank for each ring. When we count the marks on the plank, we have the age of the first tree at the time that the second one was planted. We would have a kind of rational machine, a machine that produces true conclusions from true premises - not because of any special kind of matter or energy, or because of any part that was itself intelligent or rational. All we have is a carefully contrived chain of ordinary physical events, whose first link was a configuration of matter that carries information. Our rational machine owes its rationality to two properties glued together in the entity we call a symbol: a symbol carries information, and it causes things to happen. (Tree rings correlate with the age of the tree, and they can absorb the light beam of a scanner.)

[p67]

When the caused things themselves carry information, we call the whole system an information processor, or a computer.

Now, this whole scheme might seem like an unrealizable hope. What guarantee is there that any collection of thingamabobs can be arranged to fall or swing or shine in just the right pattern so that when their effects interpreted, the interpretation will make sense? (More precisely, so that it will make sense according to some prior law or relationship we find interesting; any heap of stuff can be given a contrived interpretation after the fact.) How confident can we be that some machine will make marks that actually correspond to some meaningful state of the world, like the age of a tree when another tree was planted, or the average age of the tree's offspring, or anything else, as opposed to being a meaning‑ pattern corresponding to nothing at all?

The guarantee comes from the work of the mathematician Alan Turing. He designed a hypothetical machine whose input symbols and output symbols could correspond, depending on the details of the machine, to any one of a vast number of sensible interpretations. The machine consists of a tape divided into squares, a read‑write head that can print or read a symbol on a square and move the tape in either direction, a pointer that can point to a fixed number of tickmarks on the machine, and a set of mechanical reflexes. Each reflex is triggered by the symbol being read and the current position of the pointer, and it prints a symbol on the tape, moves the tape, and/or shifts the pointer. The machine is allowed as much tape as it needs. This design is called a Turing machine.

What can this simple machine do? It can take in symbols standing for a number or a set of numbers, and print out symbols standing for new numbers that are the corresponding value for any mathematical function that can be solved by a step‑by‑step sequence of operations (addition, multiplication, exponentiation, factoring, and so on ‑ I am being imprecise to convey the importance of Turing's discovery without the technicalities). It can apply the rules of any useful logical system to derive true statements from other true statements. It can apply the rules of any grammar to derive well‑formed sentences. The equivalence among Turing machines, calculable mathematical functions, logics, and grammars, led the logician Alonzo Church to conjecture that any well‑defined recipe or set of steps that is guaranteed to produce the solution to some problem in a finite amount of time (that is, any algorithm) can be implemented on a Turing machine.

What does this mean? It means that to the extent that the world

[p68] obeys mathematical equations that can be solved step by step, a machine can be built that simulates the world and makes predictions about it. To the extent that rational thought corresponds to the rules of logic, a machine can be built that carries out rational thought. To the extent that a language can be captured by a set of grammatical rules, a machine can be built that produces grammatical sentences. To the extent that thought consists of applying any set of well‑specified rules, a machine can be built that, in some sense, thinks.

Turing showed that rational machines ‑ machines that use the physical properties of symbols to crank out new symbols that make some kind of sense ‑ are buildable, indeed, easily buildable. The computer scientist Joseph Weizenbaum once showed how to build one out of a die, some rocks, and a roll of toilet paper. In fact, one doesn't even need a huge warehouse of these machines, one to do sums, another to do square roots, a third to print English sentences, and so on. One kind of Turing machine is called a universal Turing machine. It can take in a description of any other Turing machine printed on its tape and thereafter mimic that machine exactly. A single machine can be programmed to do anything that any set of rules can do.

Does this mean that the human brain is a Turing machine? Certainly not. There are no Turing machines in use anywhere, let alone in our heads. They are useless in practice: too clumsy, too hard to program, too big, and too slow. But it does not matter. Turing merely wanted to prove that some arrangement of gadgets could function as an intelligent symbol‑processor. Not long after his discovery, more practical symbol-processors were designed, some of which became IBM and Univac mainframes and, later, Macintoshes and PCs. But all of them were equivalent to Turing's universal machine. If we ignore size and speed, and give them as much memory storage as they need, we can program them to produce the same outputs in response to the same inputs.

Still other kinds of symbol‑processors have been proposed as models of the human mind. These models are often simulated on commercial computers, but that is just a convenience. The commercial computer is first programmed to emulate the hypothetical mental computer (creating what computer scientists call a virtual machine), in much the same way that a Macintosh can be programmed to emulate a PC. Only the virtual mental computer is taken seriously, not the silicon chips that emulate it. Then a program that is meant to model some sort of thinking (solving a problem, understanding a sentence) is run on the virtual men-

[p69] tal computer. A new way of understanding human intelligence has been born.

Let me show you how one of these models works. In an age when real computers are so sophisticated that they are almost as incomprehensible to laypeople as minds are, it is enlightening to see an example of computation in slow motion. Only then can one appreciate how simple devices can be wired together to make a symbol‑processor that shows real intelligence A lurching Turing machine is a poor advertisement for the theory that the mind is a computer, so I will use a model with at least a vague claim to resembling our mental computer. I'll show you how it solves a problem from everyday life‑kinship relations ‑ that is complex enough that we can be impressed when a machine solves it.

The model we'll use is called a production system. It eliminates the feature of commercial computers that is most starkly unbiological: the ordered list of programming steps that the computer follows single‑mindedly, one after another. A production system contains a memory and a set of reflexes, sometimes called "demons" because they are simple, self‑contained entities that sit around waiting to spring into action. The memory is like a bulletin board on which notices are posted. Each demon is a knee‑jerk reflex that waits for a particular notice on the board and responds by posting a notice of its own. The demons collectively constitute a program. As they are triggered by notices on the memory board and post notices of their own, in turn triggering other demons, and so on, the information in memory changes and eventually contains the correct output for a given input. Some demons are connected to sense organs and are triggered by information in the world rather than information in memory. Others are connected to appendages and respond by moving the appendages rather than by posting more messages in memory.

Suppose your long‑term memory contains knowledge of the immediate families of you and everyone around you. The content of that knowledge is a set of propositions like "Alex is the father of Andrew." According to the computational theory of mind, that information is embodied in symbols: a collection of physical marks that correlate with the state of the world as it is captured in the propositions.

These symbols cannot be English words and sentences, notwith-

[p70] standing the popular conception that we think in our mother tongue. As, I showed in The Language Instinct, sentences in a spoken language like English or Japanese are designed for vocal communication between impatient, intelligent social beings. They achieve brevity by leaving out any information that the listener can mentally fill in from the context. In contrast, the "language of thought" in which knowledge is couched can leave nothing to the imagination, because it is the imagination. Another problem with using English as the medium of knowledge is that English sentences can be ambiguous. When the serial killer Ted Bundy wins a stay of execution and the headline reads "Bundy Beats Date with Chair," we do a double‑take because our mind assigns two meanings to the string of words. If one string of words in English can correspond to two meanings in the mind, meanings in the mind cannot be strings of words in English. Finally, sentences in a spoken language are cluttered with articles, prepositions, gender suffixes, and other grammatical boilerplate. They are needed to help get information from one head to another by way of the mouth and the ear, a slow channel, but they are not needed inside a single head where information can be transmitted directly by thick bundles of neurons. So the statements in a knowledge system are not sentences in English but rather inscriptions in a richer language of thought, "mentalese."

In our example, the portion of mentalese that captures family relations comes in two kinds of statements. An example of the first is Alex father‑of Andrew: a name, followed by an immediate family relationship, followed by a name. An example of the second is Alex is‑male: a name followed by its sex. Do not be misled by my use of English words and syntax in the mentalese inscriptions. This is a courtesy to you, the reader, to help you keep track of what the symbols stand for. As far as the machine is concerned, they are simply different arrangements of marks. As long as we use each one consistently to stand for someone (so the symbol used for Alex is always used for Alex and never for anyone else), and arrange them according to a consistent plan (so they preserve information about who is the father of whom), they could be any marks in any arrangement at all. You can think of the marks as bar codes recognized by a scanner, or keyholes that admit only one key, or shapes that fit only one template. Of course, in a commercial computer they would be patterns of charges in silicon, and in a brain they would be firings in sets of neurons. The key point is that nothing in the machine understands them the way you or I do; parts of the machine respond to their shapes and are

[p71] triggered to do something, exactly as a gumball machine responds to the shape and weight of a coin by releasing a gumball.

The example to come is an attempt to demystify, computation, to get you to see how the trick is done. To hammer home my explanation of the trick ‑ that symbols both stand for some concept and mechanically things to happen ‑ I will step through the activity of our production system and describe everything twice: conceptually, in terms of the content of the problem and the logic that solves it, and mechanically, in terms of the brute sensing and marking motions of the system. The system is intelligent because the two correspond exactly, idea‑for‑mark, logical‑step‑for‑motion.

Let's call the portion of the system's memory that holds inscriptions about family relationships the Long‑Term Memory. Let's identify another part as the Short‑Term Memory, a scratchpad for the calculations. A part of the Short‑Term Memory is an area for goals; it contains a list of questions that the system will "try" to answer. The system wants to know whether Gordie is its biological uncle. To begin with, the memory looks like this:

Long‑Term Memory Short‑Term Memory Goal

Abel parent‑of Me Gordie uncle‑of Me?

Abel is‑male

Bella parent‑of Me

Bella is‑female

Claudia sibling‑of Me

Claudia is‑female

Duddie sibling‑of Me

Duddie is‑male

Edgar sibling‑of Abel

Edgar is‑male

Fanny sibling‑of Abel

Fanny is‑female

Gordie sibling‑of Bella

Gordie is‑male

Conceptually speaking, our goal is to find the answer to a question; the answer is affirmative if the fact it asks about is true. Mechanically speaking, the system must determine whether a string of marks in the Goal column followed by a question mark (?) has a counterpart with in identical string of marks somewhere in memory. One of the demons is designed to

[p72] answer these look-up questions by scanning for identical marks in the Goal and Long-Term Memory columns. When it detects a match, it prints a mark next to the question which indicates that it has been answered

affirmatively. For convenience, let's say the mark looks like this: Yes.

IF: Goal = blah‑blah‑blah

Long‑Term Memory = blah‑blah‑blah

THEN: MARK GOAL

Yes

The conceptual challenge faced by the system is that it does not explicitly know who is whose uncle; that knowledge is implicit in the other things it knows. To say the same thing mechanically: there is no uncle‑of mark in the Long‑Term Memory; there are only marks like sibling‑of and parent‑of. Conceptually speaking, we need to deduce knowledge of unclehood from knowledge of parenthood and knowledge of siblinghood. Mechanically speaking, we need a demon to print an uncle‑of inscription flanked by appropriate marks found in sibling‑of and parent‑of inscriptions. Conceptually speaking, we need to find out who our parents are, identify their siblings, and then pick the males. Mechanically speaking, we need the following demon, which prints new inscriptions in the Goal area that trigger the appropriate memory searches:

IF: Goal = Q uncle‑of P

THEN: ADD GOAL

Find P's Parents

Find Parents' Siblings

Distinguish Uncles/Aunts

This demon is triggered by an uncle‑of inscription in the Goal column. The Goal column indeed has one, so the demon goes to work and adds some new marks to the column:

Long‑Term Memory Short‑Term Memory Goal

Abel parent‑of Me Gordie uncle‑of Me?

Abel is‑male Find me's Parents

Bella parent‑of Me Find Parents' Siblings

Bella is‑female Distinguish Uncles/Aunts

Claudia sibling‑of Me

Claudia is‑female

Duddie sibling‑of Me

Duddie is‑male

Edgar sibling‑of Abel

Edgar is‑male

Fanny sibling‑of Abel

Fanny is‑female

Gordie sibling‑of Bella

Gordie is‑male

[p73]

There must also be a device ‑ some other demon, or extra machinery inside this demon ‑ that minds its Ps and Qs. That is, it replaces the P label with a list of the actual labels for names: Me, Abel, Gordie, and so on. I'm hiding these details to keep things simple.

The new Goal inscriptions prod other dormant demons into action. One of them (conceptually speaking) looks up the system's parents, by (mechanically speaking) copying all the inscriptions containing the names of the parents into Short‑Term Memory (unless the inscriptions are already there, of course; this proviso prevents the demon from mindlessly making copy after copy like the Sorcerer's Apprentice):

IF: Goal = Find P's Parents

Long‑Term Memory = X parent‑of P

Short‑Term Memory ¹ X parent‑of P

THEN: COPY TO Short‑Term Memory

X parent‑of P

ERASE GOAL

Our bulletin board now looks like this:

Long‑Term Memory Short‑Term Memory Goal

Abel parent‑of Me Abel parent‑of Me Gordie uncle‑of Me?

Abel is‑male Bella parent‑of Me Find Parents' Siblings

Bella parent‑of Me Distinguish Uncles/Aunts

Bella is‑female

Claudia sibling‑of Me

Claudia is‑female

Duddie sibling‑of Me

Duddie is‑male

Edgar sibling‑of Abel

Edgar is‑male

Fanny sibling‑of Abel

Fanny is‑female

Gordie sibling‑of Bella

Gordie is‑male

[p74]

Now that we know the parents, we can find the parents' siblings, Mechanically speaking: now that the names of the parents are written in Short‑Term Memory, a demon can spring into action that copies inscriptions about the parents' siblings:

IF: Goal = Find Parent's Siblings

Short‑Term Memory X parent‑of Y

Long‑Term Memory Z sibling‑of X

Short‑Term Memory ¹ Z sibling‑of X

THEN: COPY TO SHORT‑TERM MEMORY

Z sibling‑of X

ERASE GOAL

Here is its handiwork:

Long‑Term Memory Short‑Term Memory Goal

Abel parent‑of Me Abel parent‑of Me Gordie uncle‑of Me?

Abel is‑male Bella parent‑of Me Distinguish Uncles/Aunts

Bella parent‑of Me Edgar sibling‑of Abel

Bella is‑female Fanny sibling‑of Abel

Claudia sibling‑of Me Gordie sibling‑of Bella

Claudia is‑female

Duddie sibling‑of Me

Duddie is‑male

Edgar sibling‑of Abel

Edgar is‑male

Fanny sibling‑of Abel

Fanny is‑female

Gordie sibling‑of Bella

Gordie is‑male

[p75]

As it stands, we are considering the aunts and uncles collectively. To separate the uncles from the aunts, we need to find the males. Mechanically speaking, the system needs to see which inscriptions have counterparts in Long‑Term Memory with is‑male marks next to them. Here is the demon that does the checking:

IF: Goal = Distinguish Uncles/Aunts

Short‑Term Memory = X parent‑of Y

Long‑Term Memory = Z sibling‑of X

Long‑Term Memory = Z is‑male

THEN: STORE IN LONG‑TERM MEMORY

Z uncle‑of Y

ERASE GOAL

This is the demon that most directly embodies the system's knowledge of the meaning of "uncle": a male sibling of a parent. It adds the unclehood inscription to Long‑Term Memory, not Short‑Term Memory, because the inscription represents a piece of knowledge that is permanently true:

Long‑Term Memory Short‑Term Memory Goal

Edgar uncle‑of‑Me

Gordie uncle‑of‑Me

Abel parent‑of Me Abel parent‑of Me Gordie uncle‑of Me?

Abel is‑male Bella parent‑of Me

Bella parent‑of Me Edgar sibling‑of Abel

Bella is‑female Fanny sibling‑of Abel

Claudia sibling‑of Me Gordie sibling‑of Bella

Claudia is‑female

Duddie sibling‑of Me

Duddie is‑male

Edgar sibling‑of Abel

Edgar is‑male

Fanny sibling‑of Abel

Fanny is‑female

Gordie sibling‑of Bella

Gordie is‑male

Conceptually speaking, we have just deduced the fact that we inquired about. Mechanically speaking, we have just created mark‑for-

[p76] mark identical inscriptions in the Goal column and the Long‑Term Memory column. The very first demon I mentioned, which scans for such duplicates, is triggered to make the mark that indicates the problem has been solved.

What have we accomplished? We have built a system out of lifeless gumball‑machine parts that did something vaguely mindlike: it deduced the truth of a statement that it had never entertained before. From ideas about particular parents and siblings and a knowledge of the meaning of unclehood, it manufactured true ideas about particular uncles. The trick, to repeat, came from the processing of symbols: arrangements of matter that have both representational and causal properties, that is, that simultaneously carry information about something and take part in a chain of physical events. Those events make up a computation, because the machinery was crafted so that if the interpretation of the symbols that trigger the machine is a true statement, then the interpretation of the symbols created by the machine is also a true statement. The computational theory of mind is the hypothesis that intelligence is computation in this sense.

"This sense" is broad, and it shuns some of the baggage found in

[p77] other definitions of computation. For example, we need not assume that the computation is made up of a sequence of discrete steps, that the symbols must be either completely present or completely absent (as to being stronger or weaker, more active or less active), that a correct answer is guaranteed in a finite amount of time, or that the truth be "absolutely true" or "absolutely false" rather than a probability or a degree of certainty. The computational theory thus embraces an alternative kind of computer with many elements that are active to a degree corresponding to the probability that some statement is true or false and in which the activity levels change smoothly to register new and roughly accurate probabilities. (As we shall see, that may be the way the brain works.) The key idea is that the answer to the question “What makes a system smart?" is not the kind of stuff it is made of or the kind of energy flowing through it, but what the parts of the machine stand for and how the patterns of changes inside it are designed to mirror truth‑preserving relationships (including probabilistic and fuzzy truths).