Robot Pedagogue






Aaron Gemmill, Angie Keefer, Rob Seward
in collaboration with Anna Craycroft

Subject of Learning / Object of Study
Workspace: Anna Craycroft
Blanton Museum of Art
March 5 – June 20, 2010

Interview with Hunter Johnson


Angie Keefer: You studied philosophy and computer science before deciding to pursue a PhD in math. Recently, I asked you why you chose to focus on mathematics, and I was very surprised by your answer, which was basically that you wanted to understand human behavior better. What did you mean by that?


Hunter Johnson: I'm trying to avoid the trivial sounding answer that math is humanistic because it's done by humans. But this is different from saying that "gardening is humanistic" for the same reason, because gardening seems human, whereas math sometimes doesn't. There are lots of instances of fallibility in mathematics. But there seems to be an infallible quality also, particularly in arguments that can be reduced to what most people would think of as "pure logic," or the manipulation of concrete symbols according to some rules. And this is a mystery, because we shouldn't be able to reason perfectly; all of our other faculties are fraught with problems that were overlooked by evolution. Stereograms and other visual illusions exploit shortcuts and oversights in our neural hardware to produce strange results. But it is not widely acknowledged that there are similar "illusions of reason" where we begin with true premises and through valid deduction arrive at a false conclusion. In fact, most people will say that this is impossible. If one mathematician proves a statement A and his colleague proves the opposite, they don't shrug and suppose that they see things differently. But at an intuitive level, this seems to be what should happen all the time. It is very hard to separate out what elements cause this unlikely infallibility. Some are cultural (the denial of contradictions) and some are possibly metaphysical (the consistency of reality). And this is complicated by the fact that mathematics operates partly in the realm of formal reasoning, which is only a model of "true" reasoning, or whatever messy way the human brain actually combines known truths to arrive at new truths. The idea that "logical reasoning preserves truth" is really a theoretical statement in this light. And what's worse, it's a poorly defined one, because the "truth" which appears is not the easy, uncomplicated truth of mathematicians, but the more problematic and strange truth of philosophers. Modern logic could in the future come to be viewed as something as misguided as the theory of the Earth-centered universe, or the now inadequate logic of Aristotle. Nonetheless, the operating assumption of working mathematicians, at least in as far as being mathematicians, is that logic and reasoning are more or less coextensive. So, in some ways unraveling exactly what a correct argument is and what it can produce is a way of advancing mathematics, and vice versa. And this just amounts to a question of what humans will always believe and what they won't.


What are you working on now?


I'm basically trying to show that two notions of strong structure in a collection of sets are really the same. In computer science this is called the "sample compression conjecture." A couple of years ago, my former advisor and I translated this into a statement about "model theory," which is the area I work in.


A really coarse way to phrase the conjecture is to use the example of the game 20 Questions. In normal 20 Questions, even small children can get to the bottom of things pretty quickly. But mathematicians are more sinister than small children, and there are types of things you can think of for which 20 Question-like approaches will never work. For example, think of some weird blob or puddle shape. Now let's play 20 Questions and let me try to guess what you're thinking of. It's hopeless, if I'm expected to get the picture, because there isn't enough structure around. Since you've chosen an arbitrary puddle shape, the yes/no answers don't force anything to happen.


The conjecture is essentially that if there is enough structure around to force 20 Questions to work, then for anything like the object you're thinking of, there are, say, five crucial questions that really determine everything. Of course, it doesn't really have to be five; it just has to be some number that works for all objects of that kind.


There are a few practical consequences, mostly having to do with the amount of memory space certain learning machines would require to do certain things. For example, if everything HAL has learned about his environment can be captured in 13 questions, then HAL can represent his environment in his databanks as the set of yes/no answers to these questions (rather than remember the answers to the thousands of questions he possibly asked in the past).


In mathematical logic, a positive answer would give a wonderful symmetry between the two notions respectively referred to as "stability" and "dependence." A set of sentences is stable if the different ways to interpret those formal sentences can be "counted off" or classified in a certain sense. The idea of dependence is that any two things that can be defined by the same syntactical form have to truly be similar. That is, you can't just change a parameter and go from Santa Claus to the Devil. Changing parameters can only go from Santa, to Rudolph, to a worker elf, etc.


Your description of 20 Questions reminds me of the way Walter Murch, the film editor, has described the process of working on a film. He likens it to a game of Inverse 20 Questions, a variation on Exquisite Corpse. In the standard version of 20 Questions, a designated guesser leaves the room while the remaining players select an object together. The returning guesser’s objective is to identify the chosen object through the course of asking up to twenty questions of the other players. In a game of Inverse 20 Questions, the players do not concur while the guesser is out of the room. Instead, as the guesser asks each question, all the players continually modify their assumption about what the object might be. Everyone is guessing. Ideally, the entire group of players arrives at an object together (in Murch’s case, a film), without having initially agreed upon a winning answer. There is the risk, though, that the game will simply fall apart.


That's funny that you mention Inverse 20 Questions. The same game (with a different name) appears in Daniel Dennett's book Consciousness Explained. He adapts it to describe a model of perception that is apparently well-known to psychologists.


The theory says that when I see something, a lamp for instance, only a small amount of data goes to my brain, consisting of some criss-crosses of lines, and maybe some color splotches. Then, sensual details are filled out by a process something like this (occuring at a "circuitry" level):


Does it Glow?
Yes
Is it hot?
No
Is it big?
Yes
Is it a lamp?
Yes
etc.


The important thing is that when we classify a perception–when it moves from being a pre-perceptual blob to something actually in consciousness–we rely on a restricted set of pre-existing hypotheses about what could be perceived. Then a series of "observations" are used to select from among the hypotheses.


But how big should the set of hypotheses be? If it is too small, then my perception of the "noumena" will be coarse and oversimplified. Imagine, for instance, a "seeing robot" which only classifies things as squares or circles. On the other hand, if my set of hypotheses is too big, then I will have trouble selecting from among the large number of templates that fit my data.


Here is the classical working example: Kepler discovered that all planets travel in ellipses with the sun at one focus. He might have said rather that all planets travel in circles with the sun in the center (which is a special case). But this is too restrictive, because it doesn't fit the data without bringing in unacceptable inaccuracies. The phenomenon is actually more nuanced. Suppose, on the other hand, he had said, "all planets travel in continuous curves which describe loops around the sun." This model does fit the data, but it goes too far; it is consistent with any data whatsoever. So, there is a kind of Goldilocks problem that involves choosing between capacity and simplicity. In some ways this is Occam's razor, rephrased.


Have I understood correctly that the questions that would determine the structure of a "thing" would be the same (with the same answers) for all dependent things? But dependent things could have differences? So the Q&A would amount to a process of classification for like things, but not necessarily an exhaustive identification of any one thing?


One way to understand a "dependent" collection of hypotheses (you might say "sets" instead of "hypotheses"; all of this has a mathematical interpretation), is that the second problem I mentioned never occurs. A dependent class of hypotheses is, in a word, "refutable" given enough data, and in fact any large amount of data will do. For example, the hypothesis that all orbits are circles is refutable, and the set of all circles in the plane (as a formal class of hypotheses) is a dependent set. (BTW dependent sets are characterized by CS people as having finite VC dimension; this is just a synonym for dependence.)


So, restricting to a dependent (aka "finite VC dimension") class of hypotheses is a wise idea even if the theory behind positing the hypothesis is wrong. This is because your data will eventually tell you that you are wrong (about your large scale hypothesis class).


Additionally, if the hypothesis class is dependent and the theory is "right," then any particular hypothesis consistent with a set of observations is highly accurate:


If we (accurately) observe the location of Venus once in spring, twice in winter and two times again the next year and plot these locations in absolute space, there will be essentially only one ellipse which goes through all five points. Selecting the particular hypothesis (ellipse) consistent with the data, we get highly reliable information about what happens at all the "unobserved" points. This makes the whole business clear: the game is to take a small data set, "generalize" to an accurate hypothesis, and then use this as a guide; it almost allows us to "see in the dark" in the cases for which we as yet have no experience. (BTW this is what underlies the PAC=Probably Approximately Correct approach to computational learning.)


The situation of Kepler trying to analyze planetary motion (consciously) and my brain trying to categorize an unprocessed sense-impression (unconsciously) are two very similar scenarios. Both rely on a restrictive (but not too restrictive) hypothesis class consisting of the admissible ways the data could be categorized. Moreover, based on a series of observations, both Kepler and I eventually arrive at a definite hypothesis from the class: either a definite orbit in Kepler's case, or a definite lamp in mine.


Kepler had to invent his hypothesis class himself, but my brain seems to have hypothesis classes that come for free, either through training or by hard-wiring. The fact that the brain can generate these classes so easily is rather amazing, and probably has more than a little to do with the ability of the human mind to learn, at least in as far as Dennett's model is accurate.


Speaking of learning, how do you go about teaching math?


Often, what I teach isn't mathematically interesting, but it is pedagogically interesting to try to get a group of students who are behind grade level up to speed on something they are probably predisposed to loathe. I have a lot of students who work, and this takes away from the time they can put into reading. I've had to adjust my teaching style quite a bit, as a result. Math usually comes as formalism, and as a series of theorems and proofs, and as an imaginary machine that the student slowly learns to operate. For some students, that's just not realistic. I have to go directly to concrete concepts, or the class is a disaster. Formalism takes too much time and bores the students to tears. So, I have to directly talk about the concepts in ordinary language, and play fast and loose with the details. I've gone from not giving proofs to not even giving theorems, when not enough students understand ideas like "n is an arbitrary number." On the other hand, I feel like I'm giving a lot of people the first positive experience with math they've ever had. Many of them become curious.


In the beginning of his book about infinity, Everything and More, David Foster Wallace describes the process of teaching math in a similar way, referring to the difficulty that people sometimes have with abstraction. He also mentions Bremerman's Limit, to describe the physical limitation of any data processing system. What are the implications of your work for understanding the way that humans process information, and the basic, physical limitations of human cognition? Or for how cognition might be approximated, computationally?


What I have been talking about are ideas that come from computational learning theory. The logical perspective on all this material came, amazingly, from thinking about a completely different problem in infinitary combinatorics.


The idea of dependent hypothesis classes existing in the human mind solves something sometimes called the Kripkenstein problem for learning a rule. For example, I teach Sally how to do addition by showing her finitely many examples. But then she knows how to add any of the infinitely many sums she may be given! (Or does she?) But it may be that Sally simply lacks the ability (thankfully) to imagine wild possibilities for the unstated cases, because the collection of concepts (hypotheses) her brain naturally entertains are highly restrictive (dependent). This makes it easy for her to "snap onto" what I'm talking about. It would not be surprising if our entire concept structure were optimized for this kind of "snapping on" to facilitate communication.


It is fun to speculate, that if we do learn and communicate by using these highly structured application-specific concept bundles, that Plato in some way noticed them, and that this real psychological mechanism is related to his theory of the forms.


Hunter Johnson is a mathematician and Assistant Professor of Mathematics & Computer Science at the City University of New York, John Jay College. This interview was conducted via email.