# 6.7 Knowledge needs Multiple Representations
What distinguishes people from animals? Perhaps our largest distinction is that none of the others can ask such a question! We’re unique in being able to treat our own ideas as though they were things. In other words, we ‘conceptualize.’
However, to think about ideas or things, we need representations of them in our minds. Everyone who has written a program knows that you can’t get a computer to do what you want by simply ‘pouring knowledge in.” You must represent each process or fact in the form of some sort of structure. For knowledge is not composed of ‘things’ that each can exist apart from the rest—no more than a word can have a meaning without being part of some larger-scale language; fragments of knowledge can only make sense when they have appropriate kinds of interconnections. It does not much matter how these are embodied; you can make the same computer with wires and switches, or even with pulleys, blocks, and strings; all that matters is how each part changes its state in response to what some other parts do. And the same kinds of relationships can also be represented in terms of parts that have no behavior at all—such as arrangements of symbols in diagrams, or the sentences of written texts—so long as there is some way these to affect how some other systems will behave.
So when programmers set out to develop a program, they usually start by selecting a way to represent the knowledge their program will need. But each representation works well only in certain realms, and none works well in every domain. Yet we frequently hear discussions like this about what is the best way to represent knowledge:
Mathematician: It is always best to express things with Logic.
Connectionist: No, Logic is far too inflexible to represent commonsense knowledge. Instead, you ought to use Neural Networks.
Linguist: No, because Neural Nets are even more rigid. They represent things in numerical ways that are hard to convert to useful abstractions. Instead, why not simply use everyday language—with its unrivaled expressiveness.
Conceptualist: No, language is much too ambiguous. You should use Semantic Networks instead—where ideas get connected by definite concepts!
Statistician: Those linkages are too definite, and don’t express the uncertainties we face, so you need to use probabilities.
Mathematician: All such informal schemes are so unconstrained that they can be self-contradictory. Only Logic can ensure us against those circular inconsistencies."
This shows that it makes no sense to seek a single best way to represent knowledge—because each particular form of expression also brings its own particular limitations. For example, logic-based systems are very precise, but they make it hard to do reasoning with analogies. Similarly, statistical systems are useful for making predictions, but do not serve well to represent the reasons why those predictions are sometimes correct. It was recognized even in ancient times that we must represent things in multiple ways:
Aristotle: "Thus the essence of a house is assigned in such a formula as ‘a shelter against destruction by wind, rain, and heat'; the physicist would describe it as 'stones, bricks, and timbers'; but there is a third possible description which would say that it was that form in that material with that purpose or end. Which, then, among these is entitled to be regarded as the genuine physicist? The one who confines himself to the material, or the one who restricts himself to the formulable essence alone? Is it not rather the one who combines both in a single formula?"[38]
However, sometimes there are advantages to not combining those ways to describe things.
Richard Feynman: "...psychologically we must keep all the theories in our heads, and every theoretical physicist who is any good knows six or seven different theoretical representations for exactly the same physics. He knows that they are all equivalent, and that nobody is ever going to be able to decide which one is right at that level, but he keeps them in his head, hoping that they will give him different ideas for guessing."[39]
Much of our human resourcefulness comes from being able to choose among diverse ways to represent the same situation. This has value because each such point of view may provide a way to get around some deficiencies of the other ones. However, to exploit this fact, one needs to develop good ways to decide when to use each kind of representation; we’ll come back to this in §10-X. {Causal Diversity.} Of course, to change representations efficiently, one must also be able to quickly switch without losing the work that’s already been done—and that is why this chapter emphasized the iuse of panalogies to link analogous aspects of multiple ways to represent and to think about things.
QUESTIONS
[1] In Push Singh’s PhD thesis, [ref] two robots actually consider such questions. Also refer to 2004 BT paper.
[2] The idea of a panalogy first appeared in Bib: Frames, and more details about this were proposed in chapter 25 of SoM. A seeming alternative might be to have almost-separate sub-brains for each realm—but that would lead to similar questions at some higher cognitive level.
[3] I got some of these ideas about ‘trans’ from the early theories of Roger C. Schank, described in Conceptual information processing, Amsterdam: North-Holland, 1975.
[4]Tempest-Tost, 1992, ISBN: 0140167927.
[5] As suggested in §3-12 we often learn more from failure than from success—because success means you already possessed that skill, whereas failure instructs us to learn something new.
[6] See Douglas Lenat, The Dimensions of Context Space, at http://www.ai.mit.edu/people/phw/6xxx/lenat2.pdf
[7] This discussion is adapted from my introduction to Semantic Information Processing, MIT Press, 1969.
[8] From: Alexander R.Luria, The Mind of a Mnemonist: Cambridge: Harvard University Press, 1968.
[9] Landauer, Thomas K. (1986). “How much do people remember? Some estimates of the quantity of learned information in long-term memory.” Cognitive Science, 10, 477-493. See also Ralph Merkle’s description of this in http://www.merkle.com/humanMemory.html. Furthermore, according to Ronald Rosenfeld, the information in typical text is close to about 6 bits per word. See Rosenfeld, Ronald, "A maximum entropy approach to adaptive statistical language modeling,” Computer, Speech and Language, 10, 1996, also at http://www.cs.cmu.edu/afs/cs/user/roni/WWW/me-csl-revised.ps. In these studies, the term ‘bit’ of information is meant in the technical sense of C.E. Shannon in http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html.
[10] My impression that this also applies to the results reported by R.N. Haber in Behavioral and Brain Sciences, 2, 583-629,1979.
[11] A. M. Turing, Computing Machinery and Intelligence, at www.cs.swarthmore.edu/~dylan/Turing.html
[12] See several essays about self-organizing learning systems at: Gary Drescher, Made-Up Minds, MIT Press 1991, ISBN: 0262041200; Lenat’s 1983 “AM” system at http://web.media.mit.edu/~haase/thesis/node52.html; Kenneth Haase’s thesis at http://web.media.mit.edu/~haase/thesis/; Pivar, M. and Finkelstein, M. (1964) in The Programming Language LISP, MIT Press 1966; Solomonoff, R. J. “A formal theory of inductive inference,” Information and Control, 7 (1964), pp.1-22; Solomonoff, R. J. "An Inductive Inference Machine," IRE Convention Record, Section on Information Theory, Part 2, pp. 56-62, 1957. Also, see his essay at http://world.std.com/~rjs/barc97.html. In recent years this has led to a field of research with the name of ‘Genetic Programming.’
[13] Technically, if a system has already been optimized, then any change is likely to make it worse until one find a higher peak, some distance away in the “fitness space.”
[14] See §2.6 of Frames, §27.1 of SoM, and Charniak, E. C., Toward a Model of Children's Story Comprehension. ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-266.pdf
[15] There has been some recent progress toward extracting such kinds of knowledge from large number of users of the Web. See Push Singh’s ‘OpenMind Commonsense’ project at http://commonsense.media.mit.edu/.
[16] John McCarthy, “Programs with Common Sense,” in Proc. Symposium on Mechanization of Thought Processes, 1959. Reprinted in Semantic Information Processing, p404.}
[17] People sometimes use ‘abstract’ to mean ‘complex’ or ‘highly intellectual’—but here I mean almost the opposite: a more abstract description ignores more details—which makes it more useful because it depends less on the features of particular instances.
[18] See Elizabeth Johnston’s notes on “Infantile Amnesia” at http://pages.slc.edu/~ebj/IM_97/Lecture6/L6.html
[19] In each cycle of operation, the program finds some differences between the current state and the desired one. Then it uses a separate method to guess which of those differences is most significant, and makes a new subgoal to reduce that difference. If this results in a smaller difference, the process goes on; otherwise it works on some other difference. For more details of how this worked, see Newell, A., J. C. Shaw, and H. A. Simon, “Report on a general problem solving program,” in Proceedings of the International Conference on Information Processing. UNESCO, Paris, pp. 256-64. A more accessible description is in Newell, A., and Simon, H. A., “GPS, a program that simulates human thought,” Computers and Thought, E. A. Feigenbaum and J. Feldman (Eds.), McGraw-Hill, New York, 1963.
[20] See Allen Newell and Herbert Simon (1972), Human Problem Solving, Prentice Hall; (June 1972), ASIN: 0134454030. Also, see a problem-solving architecture called “SOAR.”. [Ref.]
[21] See ‘A. Newell. J. C. Shaw, and H. A. Simon, "A variety of intelligent learning in a general problem solver," in Self-Organizing Systems, M. T. Yovitts and S. Cameron, Eds., Pergamon Press, New York, 1960.
[22] In Nicomachean Ethics (Book III. 3, 1112b). This appears to be a description of what today we call ‘top-down search.”
[23]This was written before ‘security’ began to be imposed on trains.
[24] See Peter Kaiser’s www.yorku.ca/eye/disapear.htm. [Also, see §§Change-Blindness] However, there are some signals that do not ‘fade away.’ Because we also have some additional sensors that evolved to keep responding to certain particular harmful conditions. [See §§Alarms.]
[25] Roger Schank has conjectured that this may be one of our principal ways to learn and remember—in "Tell Me a Story ” Charles Scribner's Sons, New York, 1990.
[26] There are more details about this in my essay at /web.media.mit.edu/~minsky/papers/MusicMindMeaning.html
[27] In Sentics, New York: Doubleday, 1978, the pianist-physiologist Manfred Clynes describes certain temporal patterns, each of which might serve as a ‘command’ to induce a certain emotional state.
[28]G. Spencer-Brown, Laws of Form, Crown Pub. 1972, ISBN: 0517527766
[29] One could ask the same questions about gossip, sports, and games. See How people spend their time, http://www2.stats.govt.nz/domino/external/pasfull/pasfull.nsf/0/4c2567ef00247c6acc256ef6000bbb61/%24FILE/around-the-clock.pdf
[30] In his 1970 PhD thesis, Patrick H. Winston called this a “similarity network.” See [AIM xxx].
[31] http://www.gutenberg.net/etext94/arabn11.txt
[32] Letter to Joseph Priestly, 19 Sept. 1772.
[33] See http://cogsci.uwaterloo.ca/Articles/Pages/how-to-decide.html
[34] Section 30.6 of SoM discusses why the idea of free will seems so powerful. There are many more ideas about this in Daniel Dennett’s 1984 book, Elbow Room: The Varieties of Free Will Worth Wanting, ISBN 0262540428.
[35] See more details in Lenat’s essay at www.ai.mit.edu/people/phw/6xxx/lenat2.pdf
[36] See the book, Computers and Thought for some of the accomplishments of that period.
[37] Evans, Thomas G. (1963) A Heuristic Program to Solve Geometric-Analogy Problems, abridged version in Minsky (ed) Semantic Information Processing, MIT Press 1968, pp. 271-353.
[38] Aristotle, On the Soul, Book I, Part 1.
[39] Richard P. Feynman, The Character of Physical Law, MIT Press, Cambridge, MA 1965. ISBN 0262560038, p168.