Search This Blog

Tech Book Face Off: Gödel, Escher, Bach Vs. The Annotated Turing

Quite a while ago, I made a list of the best Steve Yegge posts, and one of those posts was his Ten Challenges post on ten recommended books that required thought and dedication to get everything out of them. One of those books was Gödel, Escher, Bach (henceforth referred to as GEB) by Douglas Hofstadter. It wouldn't be the last time Yegge talked about GEB, and it seems that this book had a big influence on his thinking. He certainly gave it glowing praise, and that convinced me to give it a go myself. It is not a book to be read lightly, so I finally made some time to read it and give it my full attention. I was not disappointed.

I tried to pair GEB with a book that attempted to tackle similar topics. You may think that would mean some other book that addresses the nature of intelligence, but I went for a different angle. A significant amount of GEB deals with Gödel's Incompleteness Theorem, and that theorem is closely related to the Church-Turing thesis on computability. I'm not sure where I heard about The Annotated Turing by Charles Petzold, but I thought it would be a good match for GEB. It turns out that Alan Turing's paper also has much to do with intelligence, both human and artificial, and both books stirred up all kinds of thoughts about the limits and extent of what intelligence is. With that, let's dig in to two incredibly ambitious books.

Gödel, Escher, Bach front coverVS.The Annotated Turing front cover

Gödel, Escher, Bach: An Eternal Golden Braid

Weighing in at over 700 pages of text, it's appropriate that GEB has the word 'braid' in the title, since Hofstadter weaves a great multitude of things together in this book to explore the implications of Gödel's Incompleteness Theorem and ultimately, the nature of intelligence. It's a heavy book, dealing in weighty topics and asking profound questions. Like the work of the geniuses featured in the title, this book is as much a work of art as it is a technical treatise, and the reader can find multiple levels of meaning in its prose among the insightful revelations and, yes, annoying imperfections.

I cannot hope to do such a book justice in a relatively short review (relative to the length of the book, anyway), but I will briefly describe the general format of the book and summarize the main points that Hofstadter covered. GEB is laid out as an interwoven series of chapters and dialogs. The chapters contain the main content of the book, and the dialogs attempt to expose some of the meaning of that content through discussions between the fictional characters of Achilles and Mr. Tortoise. Sometimes other characters join in, but the dialogs are primarily a discourse between the deceivingly simple Achilles and the deviously wise Tortoise.

The dialogs reveal the meaning of different topics in the chapters in various ways, including through their structure, through the ideas that the characters discuss, and through what happens to the characters during a scene. I sometimes found these dialogs to be entertaining, but in general, they were rather dull and didn't add much to my understanding of the topics Hofstadter was covering. More often than not, I grew bored with them, and I seriously considered skipping them, if it was not for the constant references to them in the rest of the text.

If the dialogs left something to be desired, the chapters were exactly the opposite. I found Hofstadter's explorations of formal systems and intelligence to be exceptionally thought-provoking, and I would get lost for hours in his musings on tangled hierarchies and self-reference.

The first half of GEB covers in detail the history of first-order logic and Gödel's Incompleteness Theorem. Hofstadter develops a number of example mathematical systems to show how proofs are constructed in mathematical logic, how to derive the arithmetical operations from a set of axioms, and how to build up to Gödel's astonishing result that any system based on the natural numbers is inherently incomplete (meaning that there exist true statements in a system that cannot be proven within that system).

At this point it may sound like this book is geared towards mathematicians, but it is decidedly not. Hofstadter does an amazing job translating mathematical jargon into a vocabulary that's much more accessible to the average person, or at least the average technical person. He introduces axioms and theorems using strings of letters and symbols in a way that makes it more of a game or mental puzzle than an exercise in mathematical notation. He does eventually get into first-order logic notation, but not until nearly halfway through the book.

I took a course in college on Gödel's Incompleteness Theorem, and I have to say, GEB would have been a nice introduction to that course. I did fine, but if I had read this book first, I would have had a much better understanding of what I was doing and the implications of what I was studying. Even so, it was probably the most enjoyable mathematics course I took in college. Gödel's theorem is simply fascinating in both its eloquence and its consequences.

The rest of the book is all about what Gödel's result means for mathematical systems, for the idea of provability, and surprisingly, for the concept of intelligence. Hofstadter pokes and prods at intelligence in as many ways as he can think of. He uses M. C. Escher's drawings and J. S. Bach's compositions as examples of systems with multiple layers of patterns and multiple levels of rules that exist within intelligence. He uses the dialogs between Achilles and Mr. Tortoise to try to come at the gnarly issues surrounding the idea of intelligence in as many ways as he can. He talks about programming languages, DNA and RNA processes, and Zen masters all in an attempt to crack this impossible nut of what makes us (arguably) rational, thinking beings.

It's a wild ride, and it constantly kindled my own thoughts on the nature of intelligence. Here are some of the topics that drove me to distraction, starting with the idea that if intelligence can be expressed as a set of rules, it forms an incomprehensible hierarchy:
The flexibility of intelligence comes from the enormous number of different rules, and levels of rules. The reason that so many rules on so many different levels must exist is that in life, a creature is faced with millions of situations of completely different types. In some situations, there are stereotyped responses which require "just plain" rules. Some situations are mixtures of stereotyped situations—thus they require rules for deciding which of the 'just plain" rules to apply. Some situations cannot be classified—thus there must exist rules for inventing new rules… and on and on. Without doubt, Strange Loops involving rules that change themselves, directly or indirectly, are at the core of intelligence.
What immediately comes to mind is the saying: "There is an exception to every rule, including this one." Think about that for too long, and your mind starts to bend. An intelligent system needs to deal with exceptions as a normal matter of course. What is interesting with the rule-based systems that we've created so far is that, when it comes to machines and computers, they are still obviously rule-based.

What does that mean? A computer will do exactly what it is programmed to do, no matter how repetitive the task is. Hofstadter posits that realizing when you're doing doing something repetitive and anticipating future steps, basically being observant, is a defining characteristic of intelligence. I think he's on to something there, although our repetitive tasks might still exist, but in a more complicated form. Life itself might be a repetitive task—living, reproducing, and dying—and we don't really question it most of the time, yet we do sometimes question the meaning of it. Regardless, we have the ability to question and to self-reflect. Is it possible that that ability comes from layer upon layer of rules in a recursive system? Hofstadter speculates as much:
This kind of thought carried a little further suggests that suitably complicated recursive systems might be strong enough to break out of any predetermined patterns. And isn't this one of the defining properties of intelligence? Instead of just considering programs composed of procedures which can recursively call themselves, why not get really sophisticated, and invent programs which can modify themselves—programs which can act on programs, extending them, improving them, generalizing them, fixing them, and so on? This kind of "tangled recursion" probably lies at the heart of intelligence.
Could a program show this emergent behavior in a way that starts to exhibit signs of intelligence if we figure out how to build stable tangled recursion into the system? I don't know, maybe. It seems compelling. Intelligence is much more than that, though. It also seems that intelligence relies on the physical world that it is a part of for its behavior.

Hofstadter brings up DNA as an example of something that encodes the characteristics of something else, like an encoded message, but the DNA itself doesn't constitute that something else on its own. The DNA is part of a larger system in the physical world of a living cell, and because of the properties of physics and chemistry, it is able to replicate and produce a living being. Is DNA a form of intelligence? Possibly. That leads to the question of what exactly is the nature of intelligence? Are there universal characteristics, e.g. pattern recognition, problem solving, analytical thinking, self-awareness, decision making, etc. (it's interesting to try to come up with a full list), or are there entirely different varieties of intelligence?

While DNA can be thought of as an encoded message, language is another way to encode messages, one that we depend on as intelligent beings to transmit knowledge. Is there such a thing as an uncoded message—something that's so basic that it's meaning is readily apparent to any form of intelligence that experiences it? Hofstadter brings up the interesting point that no such message exists, there are only codes that have become so familiar that they have stopped appearing as codes. Every message is encoded in some way, and must be decoded to gather its meaning. Does that mean using language is a symptom of intelligence? Maybe, although computers use languages to communicate as well (we call the languages protocols). However, computers use languages because they are programmed to, not by their own volition. We seem to use language for our own purposes, while computers do it by design, again coming back to the idea of self-awareness.

The way we write programs and use computers probably precludes the emergence of intelligence. Today, artificial intelligence is commonly categorized into three types of AI:
  • ANI - Artificial Narrow Intelligence is what we have today with programs excelling in a narrow task, like playing chess or running simulations. 
  • AGI - Artificial General Intelligence basically shows the capabilities of a human for navigating the world, making decisions, and solving a wide variety of problems. 
  • ASI - Artificial Super Intelligence is orders of magnitude smarter than humans and could easily solve problems that we've been working on forever without success (like curing cancer or developing AGI).
The way we design programs doesn't lead to AGI in any obvious way because we want most programs to work in a reliable, deterministic way to solve specific problems that we're working on. Additionally, (and Hofstadter doesn't go into this issue) we may be overlooking the role of other human characteristics in our perception of intelligence. We know that emotions play a large part in our ability to learn and retain knowledge, but we certainly wouldn't want our spreadsheet application or self-driving car to have emotions.

Even for less productive or less safety-critical applications, like playing chess, emotions probably wouldn't be a great characteristic for a program to have. Imagine wanting to play chess with a computer but not being able to because the computer didn't feel like it. Looking at chess programs a little closer, it's not even clear how to go from a program that can play chess better than any human to one that knows and understands that it's actually playing chess, or better yet, a program that can learn to play chess through lessons, study, and experience instead of programming. That kind of capability would be much closer to AGI, but it's an entirely different problem than programming a computer to play chess well.

Hofstadter also talks about how intelligence is related to the ability to build chunked models, reducing a complex system down to simpler components that can be more easily understood. The extreme example would be how would you simulate the universe? You certainly can't simulate every individual particle over the lifetime of the universe. You would need a computer larger than the entire universe to do it! You need to simplify in the right ways so that you can get results that correlate reasonably well with reality. An intelligent being will find ways to simplify the simulation while retaining the important parameters. An ANI program will mindlessly churn through the simulation.

Discussions of chunked models and simulation lead into how the human brain imagines things. When we think about a particular object, we somehow pull up an image in our brain of that object. Let's use a couch as an example. Different people may start with a different image for a couch, but as we add properties to it—like say it's a leather couch, it's tan leather, and it has large, squishy cushions—our mental image of that couch starts to converge. We can apply properties to an object to change our mental image of it, but then we can take that object and use it to describe instances of other objects. For instance, if I were to say I had a recliner that matched that couch, you would easily transfer all of the relevant properties to the recliner without me needing to describe the recliner in detail. We think about lots of things this way, where we find generalities in specifics and specifics in generalities. Hofstadter thinks that ability is characteristic of intelligence.

Beyond property based objects, there's also the question of how the brain represents such an enormous collection of objects, situations, and ideas in memory. Things seem to overlap so that there's no specific area of the brain that remembers 'couch' or 'tan' or 'leather.' Hofstadter uses the analogy of ripples on a pond, but I prefer to think about a TV or a computer monitor. A monitor with 24-bit color and 1080p resolution is able to display 224 different colors per pixel, and with 1920x1080 = 2,073,600 pixels, that gives 224*2,073,600 = 249,766,400 images. That's a big number, but it still pales in comparison to the human brain. The human brain has 100 billion neurons, each in either a fired or unfired state, so that means the brain can be in 2100,000,000,000 different states.

Of course a significant number of those states are garbage, but that would be true of the monitor as well. To get an idea of how much more information the human brain can hold, let's assume we have a 22" monitor with a given pixel density. If we kept the pixel density the same, how big would that monitor have to be to be able to represent the same number of states as the human brain? Well, it would need 2,009 times the number of pixels. Since the number of pixels increases with the square of the size of the monitor, that means the monitor needs to be 44.8 times bigger, meaning we would need a 986" monitor to have as many states as a human brain. That's a monitor with a 25 meter diagonal. Oh, and the brain's neurons can also have anywhere from 100 trillion to 1 quadrillion connections between neurons, so yeah, the brain can represent a lot of stuff with a lot of it overlapping.

Since we're talking about astronomically large numbers, we should talk about Gödel numbers. In the course of his proof, Gödel developed a way to encode mathematical proofs into extremely large integers so that all proofs could be enumerated. Not all numbers are Gödel numbers, but all proofs can be given a Gödel number. That was the proposition anyway, before Gödel constructed a proof by contradiction that showed there exist proofs that cannot be assigned Gödel numbers, therefore any system of natural numbers is incomplete.

What this result means in a more general sense is that there exist statements that are true or false that cannot be proven as such. Essentially, there exist things about the universe that are unknowable. What I wonder is, is this outcome pervasive in that there exist big, important statements that are unprovable, or are all of these unprovable statements of a trivial variety? Self-referential statements like "This statement is false" meet the criteria of Gödel's proof (and he constructed a mathematical expression along those lines to prove his Incompleteness Theorem), but are they meaningful in any way beyond showing that we can construct insufferable, circular statements? It might actually be impossible to know for statements that are not obviously unprovable, because we can't guarantee that these statements have a proof. They might be unprovable! We would go on trying to prove them without ever being able to prove them true or false. See, Gödel's theorem is nasty. It's a fascinating, frustrating problem.

Coming back to the issue of AI, Hofstadter grapples again and again with the question of what constitutes intelligence, and what would be the criteria for true AGI. He even brings up Tesler's Theorem that says, "AI is whatever hasn't been done yet." Part of the problem seems to be that we were defining very specific tasks that we thought were hard enough that they required AI to complete, like playing chess or proving theorems. Once we succeeded in programming a computer to do those things, we found that it was still the only thing the computer could do. It didn't seem intelligent because it was only good at one specific thing, while humans are good at a wide range of things.

Throughout the book I got a sense of what Hofstadter considered to be intelligence through the issues he discussed and the questions he asked. Humans have needed to solve a huge variety of problems to survive and progress to the point where we are today, and we've been exposed to a vast array of situations and physical stuff through an incredible number of experiences, both as individuals and as a species. Our intelligence may come from a necessary amount of knowledge that we've gathered to understand the world around us using the rich set of inputs that is our senses. Any kind of intelligence would not only need the capacity and capability of the human brain, but would also need those rich experiences to develop higher order thought processes. Of course, the list of things needed for intelligence is probably much longer than this. An AI must deal with the ambiguity of the incredibly complex system that is the real world, so the number of factors that make up intelligence isn't going to be rigidly fixed.

I've gone on now for way longer than normal, and I still haven't scratched the surface of the ideas present in GEB. (I didn't even get through half of my notes.) Hofstadter covers an amazing amount of ground. He does spend a lot of time discussing things tangentially related to his main thesis instead of directly about consciousness, intelligence, and free will, but sometimes it is worth taking a long time to say something. Parts of the book were fascinating and engrossing; other parts were dull and boring. Overall, it was excellent.

The Annotated Turing

Like GEB, The Annotated Turing by Charles Petzold is a different kind of book from anything I've ever read before. In it, he walks through one of the most influential papers in computer history, Alan Turing's On Computable Numbers, with an Application to the Entscheidungsproblem. Petzold goes to great effort to make an otherwise extremely complicated paper much easier to understand, and he takes a number of detours through the Turing's contemporary history and his personal story to put the paper in clear context. Petzold reproduces the paper in full, without corrections, and intersperses his own detailed explanations and commentary every few lines or paragraphs of the paper. He brings the reader up to speed on the necessary background knowledge to understand the paper, provides examples to clarify confusing areas, and points out mistakes that would otherwise lead the reader astray. He does an incredible job with it all.

The paper itself is about a couple of ideas. First and foremost, it is about the computability of numbers, and when we talk about the computability of numbers, we are generally talking about real numbers. Some real numbers are computable, like π and e, but it turns out that most real numbers are not. The overwhelming majority of real numbers are strings of completely random digits. Turing shows this by developing a machine with simple operations to compute real numbers, and then constructing a number that it cannot compute. This tactic should be familiar from Gödel's Incompleteness Theorem, and not surprisingly, Turing's and Gödel's results are closely related. It's quite interesting how closely related they are in time as well, with Gödel's paper (1931) coming only five years before Turing's (1936), and Alonzo Church barely beating Turing to the punch with a paper that proved the same thing in a different way.

The novelty of Turing's paper, and indeed the reason it is so well known today, is his invention of the Turing Machine and what that meant for the limits of computers. This was the second main idea in the paper, that computers could be made to perform general computations, meaning that they could be programmable. Plenty of people at the time (and even 20 years later) did not believe that such a machine could be made. Like with so many examples of progress, it took someone young and new to the field to overturn the prevalent outdated thinking, and not so much by convincing as by surpassing those that didn't believe.

So what is a Turing Machine? Quite briefly, it is a machine that can print and read symbols on squares on an infinitely long piece of paper tape. It can move left or right, read the symbol from the square that it's on, and print a symbol to the square that it's on. The state of the machine is fully determined by what's on the tape, the location of the machine on the tape, and what instruction the machine is on in its configuration. Each instruction tells the machine what to do with the tape and which instruction to go to next. With this setup, it's even possible to have instructions on the tape to start with, and a Universal Turing Machine can read and execute the instructions. The workings of the machine can get complicated quickly, but Petzold does an excellent job making the whole thing understandable. (One interesting side note is that Turing used the symbols '::' and ';' in his instruction format, and I wonder if that is the reason why those symbols come up so often in modern day programming languages.)

The main purpose of the Turing Machine was to compute and print out the digits of a number between 0 and 1, with the digits in order and continuing forever. Why forever? Well, because a real number's digits go on forever. We can never know the exact number of π, for example, but we can construct a Turing Machine to compute it. Petzold even speculates that the Turing Machine's Description Number (an integer that exactly specifies the Turing Machine, similar to a Gödel number for a proof) is a more exact representation of a number:
Which is a better representation of π? The first 32 digits followed by an ellipsis? Or the Description Number of the Turing Machine that can generate as many digits as our patience will allow? In a sense, the Description Number is a more fundamental numerical representation of π because it describes the algorithm of calculating the number.
The Description Number would be an extremely large number because of the complexity of the machine required to produce a number such as π. If we consider that the compiled machine code of a modern programming language that sits in a computer's memory could be considered a version of a Description Number, we can represent π with a finite integer that is likely much smaller than Turing's because of the efficiency of modern programming languages. However, if we also had to represent the machine, the integer may be much longer because of the complexity of modern computers. It's interesting to think that an entire computer could be represented by a single gigantic number. In fact, we essentially do represent computers with large integers through all of the digitized design files use to create them.

One of the fascinating things about computing π is that since it never ends, it quite possibly contains every sequence of digits you could think of. Petzold relates a story about Professor Luitzen Egbertus Jan Brouwer where Brouwer claims that a sequence doesn't exist within π unless it has been computed. This is a constructivist perspective on mathematics versus the Platonist perspective that the sequence probably exists even if we can't know it. Brouwer put forth the sequence 0123456789 as one that doesn't exist in π until it can be computed, and we actually found it in the mid 90s, starting at the 17,387,594,880th digit of π. Finding it doesn't prove anything, though. We can just come up with a more complicated sequence. Petzold proposes a million successive 1s. Make the sequence we're searching for long enough, and it would take more resources and time than we have before the heat death of the universe. Does that mean that sequence doesn't exist in π? Does it matter? I don't know.

All of the discussion of what is possible with computation of course leads to contemplating AI. A big question is whether certain architectures of computers prevent the emergence of AGI. Does intelligence require a certain speed and parallelism for accessing long-term memory? After all, the human brain is massively parallel, and we can recall high-resolution memories quite fast compared to how long it takes a computer to load large files from disk. Turing also speculated about this:
I believe that the provision of proper storage is the key to the problem of the digital computer, and certainly if they are to be persuaded to show any sort of genuine intelligence much larger capacities than are yet available must be provided. In my opinion this problem of making a large memory available at reasonably short notice is much more important than that of doing operations such as multiplication at high speed.
Looks like he anticipated a large problem in AGI—storage. However, in some ways we are already dependent on AI, at least in the form of ANI. Turing formed the basis of computer design by proving that any computable number can be represented in first order logic. Interestingly enough, computer processors are constructed almost entirely of AND, OR, and NOT gates, i.e. propositional logic—the basis of first-order logic. For decades we have designed computers with so many logic gates that it would be nearly impossible for us to do it unaided. We need computers to design new computers. We use ANI to compile logic in HDLs (Hardware Description Languages), to place and route logic on processors, and to simulate the logic and the electrical characteristics of the designs. If and when AGI comes about, computers and ANI will have played a large role in making that happen.

Leaving AI for now, let's turn back to the connection between Gödel's Incompleteness Theorem and Turing's undecidability theorem. Turing proves in the later sections of his paper that no decision procedure exists that can determine whether a number is computable or not. Petzold describes the distinction between the two theorems better than anyone I've seen:
Gödel's theorem and Turing's theorem approach decidability from opposite directions. Gödel's theorem shows the existence of propositions that can be neither proved nor disproved; these propositions are said to be undecidable. 
The "general method" that Turing refers to is a decision procedure—an algorithm that analyzes any arbitrary formula and determines whether it is provable or not provable. Turing will prove that no general decision procedure can exist.
Even with the existence of undecidable propositions, a decision procedure could still conceivably exist. When analyzing Gödel's undecidable proposition, it would identify both the proposition and its negation as unprovable.
Not only do undecidable propositions exist (Gödel), but we cannot even tell in a general process whether a proposition is decidable or not (Turing)! Still another related problem is the halting problem. It doesn't come up in Turing's original paper, but computability and decidability are essentially equivalent problems to the halting problem. If we could construct a program to decide whether or not an arbitrary program halted, then we could apply that program to a program that calculated arbitrary numbers. Then the first program could figure out if the second program would halt, thereby determining whether or not the number could be calculated and violating Turing's computability theorem.

The proofs of these kinds of things—the halting problem, the incompleteness theorem, the computability of numbers—always seems so slippery to me. I can understand them when I read them and they seems to make perfect sense, but the details are difficult to hold in my mind for very long. They keep slipping out of my head. It's a strange thing. Maybe if I studied them more, they would stick better, but the recursiveness of the proofs and the circularity of the critical statements seems to be elusive in some fundamental way. I wonder if it's partly because the human brain is also a Turing Machine, and therefore is limited in a similar way. There is no getting around the limits established by Turing's proof, as Petzold elaborates:
The Turing Machine not only established the basic requirements for effective calculability but also identified limits: No computer or programming language known today is more powerful than the Turing Machine; no computer or programming language can solve the halting problem; no computer or programming language can determine the ultimate destiny of another computer program. You can't get around these limitations with a "better" programming language or a different kind of machine. At best you can only do jobs faster. You can rig up thousands of processors to perform in parallel, and you can strive to create quantum computers that perform massively parallel computations, but you simply can't bring infinity any closer to this hopelessly finite world in which we live.
It's a sobering thought, but we can still achieve quite a lot before hitting the barriers of infinity. Knowing that we don't need to chase after a general decision process or proof algorithm is also freeing in a way. We can focus our attention on more immediate problems instead of getting distracted by the impossible.

I thoroughly enjoyed Petzold's guided tour of Turing's paper. He made it totally accessible, and I got to contemplate some intriguing issues. I probably never would have read Turing's paper otherwise, so I'm glad Petzold spent the time and care that he did on this book. I highly recommend it.

Incompleteness and Indecision

I've already said way more than I expected to on these books, so I'll try to keep this wrap-up short. Both GEB and The Annotated Turing explore big, big ideas that are fundamental to our concept of intelligence and the limits of what's possible. GEB takes a long, meandering route through an expansive field of difficult questions. It will definitely challenge you to think hard about the nature of intelligence and free will and probably expand your perspective. The Annotated Turing takes a more direct route through the question of computability, but makes plenty of stops along the way to explore the wide array of related concepts. Both books are excellent and related enough that they go quite well together. They are definitely worth the time to read them.