# 6.2 Commonsense Knowledge and Reasoning
Robertson Davies: You like the mind to be a neat machine equipped to work efficiently, if narrowly, and with no extra bits or useless parts. I like the mind to be a dustbin of scraps of brilliant fabric, odd gems, worthless but fascinating curiosities, tinsel, quaint bits of carving, and a reasonable amount of healthy dirt. Shake the machine and it goes out of order; shake the dustbin and it adjusts itself beautifully to its new position. [4]
Albert Einstein: A little knowledge is a dangerous thing. So is a lot.
I once encountered a fellow professor who was returning from teaching a class, and I asked him how the lecture went. The reply was that it had not gone well because “I couldn’t remember which concepts were hard.” This suggests that, over time, such experts convert some of their high-level skills into lower-level script-like processes that leave so few traces in memory that those expects cannot explain how they actually do those things. This has led many thinkers to classify knowledge into two kinds:
Knowing What. These are the kinds of ‘declarative’ or ‘explicit’ knowledge that we can express in gestures or words.
Knowing How. These are the kinds of ‘procedural’ or ‘tacit’ skills (like walking or imagining) that we find very hard to describe.
However, this popular distinction doesn’t describe the functions of those types of knowledge. Instead, for example, we might classify it in terms of the kinds of thinking that we might apply to it:
Positive Expertise: Knowing the situations in which to apply a particular fragment of knowledge.
Negative Expertise. Knowing which actions not to take, because they might make a situation worse.[5]
Debugging Skills. Knowing other ways to proceed when our usual methods fail.
Adaptive Skills. Knowing how to adapt old knowledge to new situations.
The first large-scale attempt to catalog commonsense knowledge was the “CYC” project of Douglas Lenat, which started in 1984, and is described at www.cyc.com. Many ideas in this section were inspired by the results of that project.
Douglas Lenat: “In modern America, this encompasses recent history and current affairs, everyday physics, ‘household’ chemistry, famous books and movies and songs and ads, famous people, nutrition, addition, weather, etc. … [It also includes] many "rules of thumb" largely derived from shared experiences—such as dating, driving, dining, daydreaming, etc., —and human cognitive economics (misremembering, misunderstanding, etc.,), and shared modes of reasoning both high (induction, intuition, inspiration, incubation) and low (deductive reasoning, dialectic argument, superficial analogy, pigeon-holing, etc.).”
Here Lenat describes some kinds of knowledge that a simple statement like this might engage: [6]
“Fred told the waiter he wanted some chips.”
The word “he” means Fred—and not the waiter. This event took place in a restaurant. Fred was a customer dining there. Fred and the waiter were a few feet apart. The waiter was at work there, waiting on Fred at that time.
Fred wants potato chips, not wood chips—but he does not want some particular set of chips.
Both Fred and the waiter are live human beings. Fred accomplished this by speaking words to the waiter. Both of them speak the same language. Both were old enough to talk, and the waiter was old enough to work.
Fred is hungry. He wants and expects that in a few minutes the waiter will bring him a typical portion—which Fred will start eating soon after he gets them.
We can also assume that Fred assumes that the waiter also assumes all those things.
Here is another example of how much one must know to give meaning to a commonplace statement:
“Joe’s daughter was sick so he called the doctor.”
We can assume that Joe cares about his daughter, is upset because she is sick, and wants her to be healthy. Presumably he believes she is sick because of observing some symptoms.
People have different abilities. Joe himself cannot help his daughter. People ask others for help to do things they can’t do themselves. So Joe called the doctor to help heal his daughter.
Joe’s daughter, in some sense, belongs to Joe. People care more about their own daughters than about other people’s daughters. If so advised, Joe will take the daughter to the doctor. When at the doctor’s, she will still belong to Joe.
Medical services can be expensive, but Joe is likely to forgo other spending to get the doctor to help the daughter.
These are all things that ‘everyone knows’ and uses to understand everyday stories. But along with that widely shared, common knowledge, every person also has personal knowledge; we each know our own private histories, characteristics of our acquaintances, special perceptual and motor skills, and other kinds of expertise.
Still, none of our knowledge would have any use unless we also had effective ways to apply that knowledge to solving problems. This means that we also need large bodies of skills for doing what we call commonsense thinking. We’ll come back to that in chapter §7.
# How much does a typical person know?
Everyone knows a good deal about many objects, topics, words, and ideas—and one might suppose that a typical person knows an enormous amount. However, the following argument seems to suggest that the total extent of a person’s commonsense knowledge might not be so vast. Of course, it is hard to measure this, but we can start by observing that every person knows thousands of words, and that each of those must be linked in our minds to as many as a thousand other such items. Also a typical person knows hundreds of uses and properties of thousands of different common objects. Similarly, in the social realm, one may know thousands of things about tens of people, hundreds of things about hundreds of people, and tens of useful items about as many as a thousand people.
This suggests that in each important realm, one might know perhaps a million things. But while it is easy to think of a dozen such realms, it is hard to think of a hundred of them. This suggests that a machine that does humanlike reasoning might only need a few dozen millions of items of knowledge. [7]
Citizen: Perhaps so, but I have heard of phenomenal feats of memory. What about persons with photographic memories, who can recollect all the words of a book after only a single reading of it? Could it be that we all remember, to some extent, everything that happens to us?
We all have heard such anecdotes, but whenever we try to investigate one, we usually fail to uncover the source, or find that someone was fooled by a magic show trick. Many a person has memorized an entire book of substantial size (which most usually is a religious tract)—but no one has ever been shown to have memorized a hundred such books. Here is what one psychologist said about a person who appeared to him to possess a prodigious memory:
Alexander R.Luria: "For almost thirty years the author had an opportunity systematically to observe a man whose remarkable memory... which for all practical purposes was inexhaustible" (p3) … It was of no consequence to him whether the series I gave him contained meaningful words or nonsense syllables, numbers or sounds; whether they were presented orally or in writing. All that he required was that there be a three-to-four-second pause between each element in the series. . … And he could manage, also, to repeat the performance fifteen years later, from memory." [8]
This may seem remarkable, but it might not be truly exceptional, because, in 1986, Thomas Landauer concluded that, during any extended interval, none of his subjects could learn at a rate of more than about 2 bits per second, whether the realm be visual, verbal, musical, or whatever. So, if Luria’s subject required four seconds per word, he was well within Landauer’s estimate.[9] And even if that individual were to continue this over the course of a typical lifetime, this rate of memorization would produce no more than 4000 million bits—a database that would easily fit on the surface of a Compact Disk.
Student: I’m uncomfortable with this argument. I agree that it might apply to our higher-level kinds of knowledge. But our sensory and motor skills might be based on much larger amounts of information.
We don’t have a good way to measure such things, and making such estimates raises hard questions about how those fragments of knowledge are stored and connected. Still, we have no solid evidence that any person has ever surpassed the limits that Landauer’s research suggests. [10]
Chapter §7 will speculate about how we organize knowledge so that, whenever one of our processes fails, we can usually find an alternative. But here we’ll change the subject to ask how we could endow a machine with the kinds of knowledge that people have.
# Could we build a Baby-Machine?
- Alan Turing: “We cannot expect to find a good child machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse [but] survival of the fittest is a slow method for measuring advantages. The experimenter, by the exercise of intelligence, should be able to speed it up [because] if he can trace a cause for some weakness he can probably think of the kind of mutation which will improve it.” [11]
To equip a machine with something like the knowledge we find in a typical person, we would want it to know about books and strings; about floors, ceilings, windows, and walls; about eating, sleeping, and going to work. And it wouldn’t be very useful to us unless it knew about typical human ideals and goals.
Programmer: Then, why not build a ‘baby-machine’ that learns what it needs from experience? Equip a robot with sensors and motors, and program it so that it can learn by interacting with the real world—the way that a human infant does. It could start with simple If-Then schemes, and then later invent more elaborate ones.
This is an old and popular dream: to build a machine that starts by learning in simple ways and then later develops more powerful methods—until it becomes intelligent. In fact several actual projects have had this goal, and each such system made progress at first but eventually stopped extending itself.[12] I suspect that this usually happened because those programs failed to develop good new ways to represent knowledge.
Inventing good new ways to represent knowledge is a major goal in Computer science. However, even when these are discovered, they rarely are quickly and widely adopted—because one must also develop good skills to work with them efficiently. And since such skills take time to grow, you will have to make yourself tolerate periods in which your performance becomes not better, but worse. [13]
The Investment Principle: It is hard to appreciate the virtues of a new technique because, until you become proficient with it, it will not produce results as good as you’ll get from the methods that you are familiar with.
No one has yet made a baby-machine that that developed effective new kinds of representations. Chapter §10 will argue that human brains are born equipped with machinery that eventually provides them with several different ways to represent various types of knowledge.
Here is another problem with “baby-machines.” It is easy to program computers to learn fairly simple new If Then rules; however, if a system does this too recklessly, it is likely to deteriorate from accumulating too much irrelevant information. Chapter §8 will argue that unless learning is done selectively—by making appropriate “Credit Assignments,” a machine will fail to learn the right things from most of its experiences.
Entrepreneur: Instead of trying to build a system that learns by itself, why not make one that searches the Web to extract knowledge from those millions of pages of content-rich text.
That certainly is a tempting idea, for the World Wide Web must contain more knowledge than any one person could comprehend. However, it does not explicitly include the knowledge that one would have to use to understand what all those texts mean. Consider the kind of story we find in a typical young child’s reading book.:
The World Wide Web contains more knowledge than any one person could ever learn. However, it does not explicitly display the knowledge one needs for understanding what all those texts mean. Consider the kind of story we find in a typical young child’s reading book.:
“Mary was invited to Jack’s party. She wondered if he would like a kite. She went shook her piggy bank. It made no sound.” [14]
A typical reader would assume that Jack is having a birthday party, that Mary is concerned because she needs to bring Jack a suitable present, that a good birthday present should be something that its recipient likes; that Jack might like to receive a kite; that Mary wants money to pay for that kite; and that the bank would have rattled if it contained coins. But because these are all things that ‘everyone knows’ we scarcely ever write them down, so such knowledge stays hidden ‘between the lines.’[15]
Neurologist: Why not try to copy the brain, using what brain-scientists have learned about the functions of various parts of the brain.
We learn more about more such details every week—but still do not yet know enough to simulate a spider or snake.
Programmer: What about alternatives such as building very large Neural Networks or big machines that accumulate huge libraries of statistical data?
Such systems can learn to do useful things, but I would expect them to never develop much cleverness, because they use numerical ways to represent all the knowledge they get. So, until we equip them with higher reflective levels, they won’t be able to represent the concepts they’d need for understanding what those numbers might mean.
Evolutionist: If we don’t know how to design better baby-machines, perhaps we can make them evolve by themselves. We could first write a program that writes other programs and then makes various kinds of mutations of them—and then making those programs compete for survival in suitably lifelike environments.
It took hundreds of million of years for us to evolve from the earliest vertebrate fish. Eventually a few of their descendants developed some higher-level systems like those we described in chapter §5; in fact most vertebrates never developed them. Generally, it is hard for complex systems to improve themselves because most specializations that lead to near-term gains are likely to make it much harder to change. We’ll discuss this more in §§Duplication and Diversity.
In contrast, human brains start out equipped with systems that are destined to develop into useful ways to represent knowledge. We’ll need to know more about such things before we are ready to construct efficient self -improving machines.
Architect: In this section you’ve been very negative. You’ve said that each of those methods has merit, and yet you found reasons to reject them all. But surely one could combine the virtues of all those ideas, in some way in which each offsets the others deficiencies.
Indeed, we should find ways to use them all, and we’ll propose ways to do this in subsequent chapters. I would not dismiss all prospects of building a baby-machine, but only schemes for doing this by “starting from scratch’—because it seems clear that a human baby begins equipped with intricate ways to learn, not only to master the simplest facts, but to also construct new ways to think. If you don’t agree with this, try teaching your kitten to read and write, do calculus, or dress itself.
More generally, it seems to me that all of the previous learning schemes—statistical, genetic, and logical—have ‘tapered off’ by getting stuck because of not being equipped with ways to overcome problems like these:
The Optimization Paradox: The better a system already works, the more likely each change will make it worse. See §§Duplication.
The Investment Principle: The better a certain process works, the more we will tend to rely on it, and the less likely we will be inclined to develop new alternatives.
The Parallel Processing Paradox: The more that the parts of a system interact, the more likely each change will have serious side effects.
In other words, as a system gets better it may find that it is increasingly harder to find more ways to improve itself. Evolution is often described as selecting good changes—but it actually does far more work at rejecting changes with bad effects. This is one reason why so many species evolve to occupy narrow, specialized niches that are bounded by all sorts of hazards and traps. Humans have come to escape from this by evolving features that most animals lack—such as ways to tell their descendants about the experiences of their ancestors. See §§Evolution.
In any case, for a machine to keep developing, it must have ways to protect itself against changes with too many side effects. One notable way to accomplish this is to split the whole system into parts that can evolve separately. This could be why most living things evolved as assemblies of separate ‘organs’—that is, of parts with fewer external connections. Then changes inside each of those organs will have fewer bad external effects. In particular this could be why the resources inside our brains tended to become organ-ized into more-or-less separate centers and levels—like those suggested in §5-6.
Reactive systems operate on descriptions of real, external situations. Deliberation operates on descriptions of future reactions. Reflective systems operate on descriptions of deliberations. Self-Reflection operates on descriptions of reflections.
Why emphasize descriptions here? That’s because we could never learn enough low-level If-Then rules, and the only alternative is to use abstractions—as was argued in 1959 in an essay called Programs with Common Sense.[16]
- John McCarthy: “If one wants a machine to discover an abstraction, it seems most likely that the machine must be able to represent this abstraction in some relatively simple way.”
We need to make our descriptions abstract because no two situations are ever the same, so as we saw in §5-2, our descriptions must not be too concrete—or they would not apply to new situations. However, as we noted in §5-3, no representation should be too abstract, or it will suppress too many details. [17]