The
Computational Theory of Mind
Week
2 Reading for Principles of Intelligent Systems
From:
Pinker,
Steven. (1997). How the Mind Works.
Penguin Books, Australia. pp. 64-77.
[p64]
Thinking Machines
The traditional explanation of intelligence is
that human flesh is suffused with a non‑material entity, the soul,
usually envisioned as some kind of ghost or spirit. But the theory faces an
insurmountable problem: How does the spook interact with solid matter? How does
an ethereal nothing respond to flashes, pokes, and beeps and get arms and legs
to move? Another problem is the overwhelming evidence that the mind is the
activity of the brain. The supposedly immaterial soul, we now know, can be
bisected with a knife, altered by chemicals, started or stopped by electricity,
and extinguished by a sharp blow or by insufficient oxygen. Under a microscope,
the brain has a breathtaking complexity of physicical structure fully
commensurate with the richness of the mind.
Another explanation is that mind comes from
some extraordinary form of matter. Pinocchio was animated by a magical kind of
wood found by Geppetto that talked, laughed, and moved on its own. Alas, no one
has ever discovered such a wonder substance. At first one might think that the
wonder substance is brain tissue. Darwin wrote that the brain
"secretes" the mind, and recently the philosopher John Searle has
argued that the physico‑chemical properties of brain tissue somehow
produce the mind just as breast tissue produces milk and plant tissue produces
sugar. But recall that the same kinds of membranes, pores, and chemicals are
found in brain tissue throughout the animal kingdom, not to mention in brain
tumors and cultures in dishes. All of these globs of neural tissue have the
same physico‑chemical properties, but not all of
[p65]
them accomplish humanlike intelligence. Of course, something about the
tissue of the human brain is necessary for our intelligence, but the physical
properties are not sufficient, just as the physical properties of bricks are
not sufficient to explain architecture and the physical properties of oxide
particles are not sufficient to explain music. Something in the patterning
of neural tissue is crucial.
InteIligence has often been attributed to some
kind of energy flow or force field. Orbs, luminous vapors, auras, vibrations,
magnetic fields, and lines of force figure prominently in spiritualism,
pseudoscience, and science fiction kitsch. The school of Gestalt psychology
tried to explain illusions in terms of electromagnetic force fields on the
surface of the brain, but the fields were never found. Occasionally the brain
surface has been described as a continuous vibrating medium that supports
holograms or other wave interference patterns, but that idea, too, has not
panned out. The hydraulic model, with its psychic pressure building up,
bursting out, or being diverted through alternative channels, lay at the center
of Freud's theory and can be found in dozens of everyday metaphors: anger
welling up, letting off steam, exploding under the pressure, blowing one's
stack, venting one's feelings, bottling up rage. But even it the hottest
emotions do not literally correspond to a buildup and discharge of energy, (in
the physicist's sense) somewhere in the, brain. In Chapter 6 I will try to
persuade you that the brain does not actually operate by internal
pressures but contrives them as a negotiating tactic, like a terrorist
with explosives strapped to his body.
A problem with all these ideas is that even if
we did discover some gel or vortex or vibration or orb that spoke and
plotted mischief like Geppetto's log, or that, more generally, made decisions
based on rational and pursued a goal in the face of obstacles, we would still
be faced with the mystery of how it accomplished those feats.
No, intelligence does not come from a special
kind of spirit or matter or energy but from a different commodity, information.
Information is a correlation between two things that is produced by a lawful
process (as to coming about by sheer chance). We say that the rings in a stump
carry information about the age of the tree because their number correlates
with the tree's age (the older the tree, the more rings it has), and the
correlation is not a coincidence but is caused by the way, trees grow.
Correlation is a mathematical and logical concept; it is not defined In terms
of the stuff that the correlated entities are made of.
Information itself is nothing special; it is
found leave
[p66]
effects. What is special is information processing. We can regard a piece of matter that carries
information about some state affairs as a symbol; it call "stand for"
that state of affairs. But as a piece of matter, it can do other things as well
‑ physical things, whatever that kind of matter in that kind of state can
do according to the laws of physics and chemistry. Tree rings carry information
about age, but they also reflect light and absorb staining material. Footprints
carry information about animal motions, but they also trap water and cause
eddies in the wind.
Now here is an idea. Suppose one were to build
a machine with parts that are affected by the physical properties of some
symbol. Some lever or electric eye or tripwire or magnet is set in motion by
the pigment absorbed by a tree ring, or the water trapped by a footprint, or
the light reflected by a chalk mark, or the magnetic charge in a bit of oxide.
And suppose that the machine then causes something to happen in some other pile
of matter. It burns new marks onto a piece of wood, or stamps impressions into
nearby dirt, or charges some other bit of oxide. Nothing special has happened
so far; all I have described is a chain of physical events accomplished by a
pointless contraption.
Here is the special step. Imagine that we now
try to interpret the newly arranged piece of matter using the scheme according
to which the original piece carried information. Say we count the newly burned
wood rings and interpret them as the age of some tree at some time, even though
they were not caused by the growth of any tree. And let's say that the machine
was carefully designed so that the interpretation of its new markings made
sense ‑ that is, so that they carried information about something in the
world. For example, imagine a machine that scans the rings in a stump, burns
one mark on a nearby plank for each ring, moves over to a smaller stump from a
tree that was cut down at the same time, scans its rings, and sands off one
mark in the plank for each ring. When we count the marks on the plank, we have
the age of the first tree at the time that the second one was planted. We would
have a kind of rational machine, a machine that produces true
conclusions from true premises - not because of any special kind of matter or
energy, or because of any part that was itself intelligent or rational. All we
have is a carefully contrived chain of ordinary physical events, whose first
link was a configuration of matter that carries information. Our rational
machine owes its rationality to two properties glued together in the entity we
call a symbol: a symbol carries information, and it causes things to happen.
(Tree rings correlate with the age of the tree, and they can absorb the light
beam of a scanner.)
[p67]
When the caused things themselves carry
information, we call the whole system an information processor, or a computer.
Now, this whole scheme might seem like an
unrealizable hope. What guarantee is there that any collection of thingamabobs
can be arranged to fall or swing or shine in just the right pattern so that
when their effects interpreted, the interpretation will make sense? (More
precisely, so that it will make sense according to some prior law or
relationship we find interesting; any heap of stuff can be given a contrived
interpretation after the fact.) How confident can we be that some machine will
make marks that actually correspond to some meaningful state of the world, like
the age of a tree when another tree was planted, or the average age of the
tree's offspring, or anything else, as opposed to being a meaning‑
pattern corresponding to nothing at all?
The guarantee comes from the work of the
mathematician Alan Turing. He designed a hypothetical machine whose input
symbols and output symbols could correspond, depending on the details of the
machine, to any one of a vast number of sensible interpretations. The machine
consists of a tape divided into squares, a read‑write head that can print
or read a symbol on a square and move the tape in either direction, a pointer
that can point to a fixed number of tickmarks on the machine, and a set of
mechanical reflexes. Each reflex is triggered by the symbol being read and the
current position of the pointer, and it prints a symbol on the tape, moves the
tape, and/or shifts the pointer. The machine is allowed as much tape as it
needs. This design is called a Turing machine.
What can this simple machine do? It can take in
symbols standing for a number or a set of numbers, and print out symbols
standing for new numbers that are the corresponding value for any mathematical
function that can be solved by a step‑by‑step sequence of
operations (addition, multiplication, exponentiation, factoring, and so on ‑
I am being imprecise to convey the importance of Turing's discovery without the
technicalities). It can apply the rules of any useful logical system to derive
true statements from other true statements. It can apply the rules of any
grammar to derive well‑formed sentences. The equivalence among Turing
machines, calculable mathematical functions, logics, and grammars, led the
logician Alonzo Church to conjecture that any well‑defined recipe
or set of steps that is guaranteed to produce the solution to some problem in a
finite amount of time (that is, any algorithm) can be implemented on a Turing
machine.
What does this mean? It means that to the extent
that the world
[p68]
obeys mathematical equations that can be solved step by step, a machine can be
built that simulates the world and makes predictions about it. To the extent
that rational thought corresponds to the rules of logic, a machine can be built
that carries out rational thought. To the extent that a language can be
captured by a set of grammatical rules, a machine can be built that produces
grammatical sentences. To the extent that thought consists of applying any set
of well‑specified rules, a machine can be built that, in some sense,
thinks.
Turing showed that rational machines ‑
machines that use the physical properties of symbols to crank out new symbols
that make some kind of sense ‑ are buildable, indeed, easily buildable.
The computer scientist Joseph Weizenbaum once showed how to build one out of a
die, some rocks, and a roll of toilet paper. In fact, one doesn't even need a
huge warehouse of these machines, one to do sums, another to do square roots, a
third to print English sentences, and so on. One kind of Turing machine is
called a universal Turing machine. It can take in a description of any other
Turing machine printed on its tape and thereafter mimic that machine exactly. A
single machine can be programmed to do anything that any set of rules can do.
Does this mean that the human brain is a Turing machine? Certainly not. There are no Turing machines in use anywhere, let alone in our heads. They are useless in practice: too clumsy, too hard to program, too big, and too slow. But it does not matter. Turing merely wanted to prove that some arrangement of gadgets could function as an intelligent symbol‑processor. Not long after his discovery, more practical symbol-processors were designed, some of which became IBM and Univac mainframes and, later, Macintoshes and PCs. But all of them were equivalent to Turing's universal machine. If we ignore size and speed, and give them as much memory storage as they need, we can program them to produce the same outputs in response to the same inputs.
Still other kinds of symbol‑processors
have been proposed as models of the human mind. These models are often
simulated on commercial computers, but that is just a convenience. The
commercial computer is first programmed to emulate the hypothetical mental
computer (creating what computer scientists call a virtual machine), in much
the same way that a Macintosh can be programmed to emulate a PC. Only the
virtual mental computer is taken seriously, not the silicon chips that emulate
it. Then a program that is meant to model some sort of thinking (solving a
problem, understanding a sentence) is run on the virtual men-
[p69] tal
computer. A new way of understanding human intelligence has been born.
-
Let me show you how one of these models works. In an age when real computers are so sophisticated that they are almost as incomprehensible to laypeople as minds are, it is enlightening to see an example of computation in slow motion. Only then can one appreciate how simple devices can be wired together to make a symbol‑processor that shows real intelligence A lurching Turing machine is a poor advertisement for the theory that the mind is a computer, so I will use a model with at least a vague claim to resembling our mental computer. I'll show you how it solves a problem from everyday life‑kinship relations ‑ that is complex enough that we can be impressed when a machine solves it.
The model we'll use is called a production
system. It eliminates the feature of commercial computers that is most starkly
unbiological: the ordered list of programming steps that the computer follows
single‑mindedly, one after another. A production system contains a memory
and a set of reflexes, sometimes called "demons" because they are
simple, self‑contained entities that sit around waiting to spring into
action. The memory is like a bulletin board on which notices are posted. Each
demon is a knee‑jerk reflex that waits for a particular notice on the
board and responds by posting a notice of its own. The demons collectively
constitute a program. As they are triggered by notices on the memory board and
post notices of their own, in turn triggering other demons, and so on, the
information in memory changes and eventually contains the correct output for a
given input. Some demons are connected to sense organs and are triggered by
information in the world rather than information in memory. Others are
connected to appendages and respond by moving the appendages rather than by
posting more messages in memory.
Suppose your long‑term memory contains
knowledge of the immediate families of you and everyone around you. The content
of that knowledge is a set of propositions like "Alex is the father of
Andrew." According to the computational theory of mind, that
information is embodied in symbols: a collection of physical marks that
correlate with the state of the world as it is captured in the propositions.
These symbols cannot be English words and
sentences, notwith-
[p70]
standing the popular conception that we
think in our mother tongue. As, I showed in The Language Instinct,
sentences in a spoken language like English or Japanese are designed for vocal
communication between impatient, intelligent social beings. They achieve
brevity by leaving out any information that the listener can mentally fill in
from the context. In contrast, the "language of thought" in which
knowledge is couched can leave nothing to the imagination, because it is the
imagination. Another problem with using English as the medium of knowledge is
that English sentences can be ambiguous. When the serial killer Ted Bundy wins
a stay of execution and the headline reads "Bundy Beats Date with
Chair," we do a double‑take because our mind assigns two meanings to
the string of words. If one string of words in English can correspond to two
meanings in the mind, meanings in the mind cannot be strings of words in
English. Finally, sentences in a spoken language are cluttered with articles,
prepositions, gender suffixes, and other grammatical boilerplate. They are
needed to help get information from one head to another by way of the mouth and
the ear, a slow channel, but they are not needed inside a single head where
information can be transmitted directly by thick bundles of neurons. So the
statements in a knowledge system are not sentences in English but rather
inscriptions in a richer language of thought, "mentalese."
In our example, the portion of mentalese that
captures family relations comes in two kinds of statements. An example of the
first is Alex father‑of
Andrew: a name, followed by an immediate family
relationship, followed by a name. An example of the second is Alex is‑male: a
name followed by its sex. Do not be misled by my use of English words and
syntax in the mentalese inscriptions. This is a courtesy to you, the reader, to
help you keep track of what the symbols stand for. As far as the machine is
concerned, they are simply different arrangements of marks. As long as we use
each one consistently to stand for someone (so the symbol used for Alex is
always used for Alex and never for anyone else), and arrange them according to
a consistent plan (so they preserve information about who is the father of
whom), they could be any marks in any arrangement at all. You can think of the
marks as bar codes recognized by a scanner, or keyholes that admit only one
key, or shapes that fit only one template. Of course, in a commercial computer
they would be patterns of charges in silicon, and in a brain they would be
firings in sets of neurons. The key point is that nothing in the machine
understands them the way you or I do; parts of the machine respond to their
shapes and are
[p71]
triggered to do something, exactly as a gumball machine responds to the shape
and weight of a coin by releasing a gumball.
The example to come is an attempt to demystify,
computation, to get you to see how the trick is done. To hammer home my
explanation of the trick ‑ that symbols both stand for some concept and
mechanically things to happen ‑ I will step through the activity of our
production system and describe everything twice: conceptually, in terms of the
content of the problem and the logic that solves it, and mechanically, in terms
of the brute sensing and marking motions of the system. The system is
intelligent because the two correspond exactly, idea‑for‑mark,
logical‑step‑for‑motion.
Let's call the portion of the system's memory that holds inscriptions about family relationships the Long‑Term Memory. Let's identify another part as the Short‑Term Memory, a scratchpad for the calculations. A part of the Short‑Term Memory is an area for goals; it contains a list of questions that the system will "try" to answer. The system wants to know whether Gordie is its biological uncle. To begin with, the memory looks like this:
Long‑Term
Memory Short‑Term Memory Goal
Abel parent‑of
Me Gordie uncle‑of Me?
Abel is‑male
Bella parent‑of
Me
Bella is‑female
Claudia sibling‑of
Me
Claudia is‑female
Duddie sibling‑of
Me
Duddie is‑male
Edgar sibling‑of
Abel
Edgar is‑male
Fanny sibling‑of
Abel
Fanny is‑female
Gordie sibling‑of
Bella
Gordie is‑male
Conceptually speaking, our goal is to find the
answer to a question; the answer is affirmative if the fact it asks about is
true. Mechanically speaking, the system must determine whether a string of
marks in the Goal column followed by a question mark (?) has a counterpart with
in identical string of marks somewhere in memory. One of the demons is designed
to
[p72]
answer these look-up questions by scanning for identical marks in the Goal and
Long-Term Memory columns. When it detects a match, it prints a mark next to the
question which indicates that it has been answered
affirmatively. For convenience, let's say the
mark looks like this: Yes.
IF: Goal = blah‑blah‑blah
Long‑Term Memory = blah‑blah‑blah
THEN: MARK GOAL
Yes
The conceptual challenge faced by the system is
that it does not explicitly know who is whose uncle; that knowledge is implicit
in the other things it knows. To say the same thing mechanically: there is
no uncle‑of mark in the Long‑Term Memory; there are only marks like sibling‑of and
parent‑of. Conceptually speaking, we need to deduce knowledge of unclehood from
knowledge of parenthood and knowledge of siblinghood. Mechanically speaking, we
need a demon to print an uncle‑of inscription flanked by appropriate marks found in sibling‑of and
parent‑of inscriptions. Conceptually speaking, we need to find out who our
parents are, identify their siblings, and then pick the males. Mechanically
speaking, we need the following demon, which prints new inscriptions in the
Goal area that trigger the appropriate memory searches:
IF: Goal = Q uncle‑of P
THEN: ADD GOAL
Find P's Parents
Find Parents' Siblings
Distinguish Uncles/Aunts
This demon is triggered by an uncle‑of
inscription in the Goal column. The Goal column indeed has one, so the demon
goes to work and adds some new marks to the column:
Long‑Term
Memory Short‑Term Memory Goal
Abel parent‑of
Me Gordie uncle‑of Me?
Abel is‑male Find me's Parents
Bella parent‑of
Me Find Parents' Siblings
Bella is‑female
Distinguish Uncles/Aunts
Claudia sibling‑of
Me
Claudia is‑female
Duddie sibling‑of
Me
Duddie is‑male
Edgar sibling‑of
Abel
Edgar is‑male
Fanny sibling‑of
Abel
Fanny is‑female
Gordie sibling‑of
Bella
Gordie is‑male
[p73]
There must also be a device ‑ some other
demon, or extra machinery inside this demon ‑ that minds its Ps and Qs. That is, it replaces the P label with a list of the
actual labels for names: Me, Abel, Gordie, and so on. I'm hiding these details to keep things simple.
The new Goal inscriptions prod other dormant
demons into action. One of them (conceptually speaking) looks up the system's
parents, by (mechanically speaking) copying all the inscriptions containing the
names of the parents into Short‑Term Memory (unless the inscriptions are
already there, of course; this proviso prevents the demon from mindlessly
making copy after copy like the Sorcerer's Apprentice):
IF: Goal = Find P's Parents
Long‑Term Memory = X parent‑of P
Short‑Term Memory ¹ X parent‑of P
THEN: COPY TO Short‑Term Memory
X parent‑of P
ERASE GOAL
Our bulletin board now looks like this:
Long‑Term
Memory Short‑Term Memory Goal
Abel parent‑of
Me Abel parent‑of Me Gordie
uncle‑of Me?
Abel is‑male Bella parent‑of Me Find
Parents' Siblings
Bella parent‑of
Me Distinguish Uncles/Aunts
Bella is‑female
Claudia sibling‑of
Me
Claudia is‑female
Duddie sibling‑of
Me
Duddie is‑male
Edgar sibling‑of
Abel
Edgar is‑male
Fanny sibling‑of
Abel
Fanny is‑female
Gordie sibling‑of
Bella
Gordie is‑male
[p74]
Now that we know the parents, we can find the
parents' siblings, Mechanically speaking: now that the names of the parents are
written in Short‑Term Memory, a demon can spring into action that copies
inscriptions about the parents' siblings:
IF: Goal = Find Parent's Siblings
Short‑Term Memory X parent‑of Y
Long‑Term Memory Z sibling‑of X
Short‑Term Memory ¹ Z sibling‑of X
THEN: COPY TO SHORT‑TERM MEMORY
Z sibling‑of X
ERASE GOAL
Here is its handiwork:
Long‑Term
Memory Short‑Term Memory Goal
Abel parent‑of
Me Abel parent‑of Me Gordie uncle‑of Me?
Abel is‑male Bella parent‑of Me Distinguish Uncles/Aunts
Bella parent‑of
Me Edgar sibling‑of Abel
Bella is‑female
Fanny sibling‑of Abel
Claudia sibling‑of
Me Gordie sibling‑of Bella
Claudia is‑female
Duddie sibling‑of
Me
Duddie is‑male
Edgar sibling‑of
Abel
Edgar is‑male
Fanny sibling‑of
Abel
Fanny is‑female
Gordie sibling‑of
Bella
Gordie is‑male
[p75]
As it stands, we are considering the aunts and
uncles collectively. To separate the uncles from the aunts, we need to find the
males. Mechanically speaking, the system needs to see which inscriptions have
counterparts in Long‑Term Memory with is‑male marks next to them.
Here is the demon that does the checking:
IF: Goal = Distinguish Uncles/Aunts
Short‑Term Memory = X parent‑of
Y
Long‑Term Memory = Z sibling‑of
X
Long‑Term Memory = Z is‑male
THEN: STORE IN LONG‑TERM MEMORY
Z uncle‑of Y
ERASE GOAL
This is the demon that most directly embodies
the system's knowledge of the meaning of "uncle": a male sibling of a
parent. It adds the unclehood inscription to Long‑Term Memory, not Short‑Term
Memory, because the inscription represents a piece of knowledge that is
permanently true:
Long‑Term
Memory Short‑Term Memory Goal
Edgar uncle‑of‑Me
Gordie uncle‑of‑Me
Abel parent‑of
Me Abel parent‑of Me Gordie uncle‑of Me?
Abel is‑male Bella parent‑of Me
Bella parent‑of
Me Edgar sibling‑of Abel
Bella is‑female
Fanny sibling‑of Abel
Claudia sibling‑of
Me Gordie sibling‑of Bella
Claudia is‑female
Duddie sibling‑of
Me
Duddie is‑male
Edgar sibling‑of
Abel
Edgar is‑male
Fanny sibling‑of
Abel
Fanny is‑female
Gordie sibling‑of
Bella
Gordie is‑male
Conceptually speaking, we have just deduced the
fact that we inquired about. Mechanically speaking, we have just created mark‑for-
[p76]
mark identical inscriptions in the Goal column and the Long‑Term Memory
column. The very first demon I mentioned, which scans for such duplicates, is
triggered to make the mark that indicates the problem has been solved.
What have we accomplished? We have built a
system out of lifeless gumball‑machine parts that did something vaguely
mindlike: it deduced the truth of a statement that it had never entertained
before. From ideas about particular parents and siblings and a knowledge of the
meaning of unclehood, it manufactured true ideas about particular uncles. The
trick, to repeat, came from the processing of symbols: arrangements of matter
that have both representational and causal properties, that is, that
simultaneously carry information about something and take part in a chain of
physical events. Those events make up a computation, because the machinery was
crafted so that if the interpretation of the symbols that trigger the machine
is a true statement, then the interpretation of the symbols created by the
machine is also a true statement. The computational theory of mind is the
hypothesis that intelligence is computation in this sense.
"This sense" is broad, and it shuns
some of the baggage found in
[p77]
other definitions of computation. For example, we need not assume that the computation is made up of a sequence of
discrete steps, that the symbols must be either completely present or completely
absent (as to being stronger or weaker, more active or less active), that a
correct answer is guaranteed in a finite amount of time, or that the truth be
"absolutely true" or "absolutely false" rather than a
probability or a degree of certainty. The computational theory thus embraces an
alternative kind of computer with many elements that are active to a degree
corresponding to the probability that some statement is true or false
and in which the activity levels change smoothly to register new and roughly
accurate probabilities. (As we shall see, that may be the way the brain works.)
The key idea is that the answer to the question “What makes a system smart?" is not the kind of stuff it is made
of or the kind of energy flowing through it, but what the parts of the machine
stand for and how the patterns of changes inside it are designed to mirror
truth‑preserving relationships (including probabilistic and fuzzy
truths).