1A: Overview and Introduction to Lisp

Flash and JavaScript are required for this feature.

Download the video from Internet Archive.

Topics covered: Overview and Introduction to Lisp

Instructors: Hal Abelson and Gerald Jay Sussman

Subtitles for this course are provided through the generous assistance of Henry Baker, Hoofar Pourzand, Heather Wood, Aleksejs Truhans, Steven Edwards, George Menhorn, and Mahendra Kumar.

[MUSIC PLAYING]

PROFESSOR: I'd like to welcome you to this course on computer science. Actually, that's a terrible way to start. Computer science is a terrible name for this business. First of all, it's not a science. It might be engineering or it might be art, but we'll actually see that computer so-called science actually has a lot in common with magic, and we'll see that in this course.

So it's not a science. It's also not really very much about computers. And it's not about computers in the same sense that physics is not really about particle accelerators, and biology is not really about microscopes and petri dishes. And it's not about computers in the same sense that geometry is not really about using surveying instruments.

In fact, there's a lot of commonality between computer science and geometry. Geometry, first of all, is another subject with a lousy name. The name comes from Gaia, meaning the Earth, and metron, meaning to measure. Geometry originally meant measuring the Earth or surveying.

And the reason for that was that, thousands of years ago, the Egyptian priesthood developed the rudiments of geometry in order to figure out how to restore the boundaries of fields that were destroyed in the annual flooding of the Nile. And to the Egyptians who did that, geometry really was the use of surveying instruments.

Now, the reason that we think computer science is about computers is pretty much the same reason that the Egyptians thought geometry was about surveying instruments. And that is, when some field is just getting started and you don't really understand it very well, it's very easy to confuse the essence of what you're doing with the tools that you use. And indeed, on some absolute scale of things, we probably know less about the essence of computer science than the ancient Egyptians really knew about geometry.

Well, what do I mean by the essence of computer science? What do I mean by the essence of geometry? See, it's certainly true that these Egyptians went off and used surveying instruments, but when we look back on them after a couple of thousand years, we say, gee, what they were doing, the important stuff they were doing, was to begin to formalize notions about space and time, to start a way of talking about mathematical truths formally. That led to the axiomatic method. That led to sort of all of modern mathematics, figuring out a way to talk precisely about so-called declarative knowledge, what is true.

Well, similarly, I think in the future people will look back and say, yes, those primitives in the 20th century were fiddling around with these gadgets called computers, but really what they were doing is starting to learn how to formalize intuitions about process, how to do things, starting to develop a way to talk precisely about how-to knowledge, as opposed to geometry that talks about what is true.

Let me give you an example of that. Let's take a look. Here is a piece of mathematics that says what a square root is. The square root of X is the number Y, such that Y squared is equal to X and Y is greater than 0. Now, that's a fine piece of mathematics, but just telling you what a square root is doesn't really say anything about how you might go out and find one.

So let's contrast that with a piece of imperative knowledge, how you might go out and find a square root. This, in fact, also comes from Egypt, not ancient, ancient Egypt. This is an algorithm due to Heron of Alexandria, called how to find a square root by successive averaging. And what it says is that, in order to find a square root, you make a guess, you improve that guess-- and the way you improve the guess is to average the guess and X over the guess, and we'll talk a little bit later about why that's a reasonable thing-- and you keep improving the guess until it's good enough.

That's a method. That's how to do something as opposed to declarative knowledge that says what you're looking for. That's a process. Well, what's a process in general? It's kind of hard to say. You can think of it as like a magical spirit that sort of lives in the computer and does something. And the thing that directs a process is a pattern of rules called a procedure.

So procedures are the spells, if you like, that control these magical spirits that are the processes. I guess you know everyone needs a magical language, and sorcerers, real sorcerers, use ancient Arcadian or Sumerian or Babylonian or whatever. We're going to conjure our spirits in a magical language called Lisp, which is a language designed for talking about, for casting the spells that are procedures to direct the processes.

Now, it's very easy to learn Lisp. In fact, in a few minutes, I'm going to teach you, essentially, all of Lisp. I'm going to teach you, essentially, all of the rules. And you shouldn't find that particularly surprising. That's sort of like saying it's very easy to learn the rules of chess. And indeed, in a few minutes, you can tell somebody the rules of chess. But of course, that's very different from saying you understand the implications of those rules and how to use those rules to become a masterful chess player.

Well, Lisp is the same way. We're going to state the rules in a few minutes, and it'll be very easy to see. But what's really hard is going to be the implications of those rules, how you exploit those rules to be a master programmer. And the implications of those rules are going to take us the, well, the whole rest of the subject and, of course, way beyond.

OK, so in computer science, we're in the business of formalizing this sort of how-to imperative knowledge, how to do stuff. And the real issues of computer science are, of course, not telling people how to do square roots. Because if that was all it was, there wouldn't be no big deal. The real problems come when we try to build very, very large systems, computer programs that are thousands of pages long, so long that nobody can really hold them in their heads all at once.

And the only reason that that's possible is because there are techniques for controlling the complexity of these large systems. And these techniques that are controlling complexity are what this course is really about. And in some sense, that's really what computer science is about.

Now, that may seem like a very strange thing to say. Because after all, a lot of people besides computer scientists deal with controlling complexity. A large airliner is an extremely complex system, and the aeronautical engineers who design that are dealing with immense complexity. But there's a difference between that kind of complexity and what we deal with in computer science.

And that is that computer science, in some sense, isn't real. You see, when an engineer is designing a physical system, that's made out of real parts. The engineers who worry about that have to address problems of tolerance and approximation and noise in the system. So for example, as an electrical engineer, I can go off and easily build a one-stage amplifier or a two-stage amplifier, and I can imagine cascading a lot of them to build a million-stage amplifier. But it's ridiculous to build such a thing, because long before the millionth stage, the thermal noise in those components way at the beginning is going to get amplified and make the whole thing meaningless.

Computer science deals with idealized components. We know as much as we want about these little program and data pieces that we're fitting things together. We don't have to worry about tolerance. And that means that, in building a large program, there's not all that much difference between what I can build and what I can imagine, because the parts are these abstract entities that I know as much as I want.

I know about them as precisely as I'd like. So as opposed to other kinds of engineering, where the constraints on what you can build are the constraints of physical systems, the constraints of physics and noise and approximation, the constraints imposed in building large software systems are the limitations of our own minds.

So in that sense, computer science is like an abstract form of engineering. It's the kind of engineering where you ignore the constraints that are imposed by reality. Well, what are some of these techniques? They're not special to computer science. First technique, which is used in all of engineering, is a kind of abstraction called black-box abstraction. Take something and build a box about it.

Let's see, for example, if we looked at that square root method, I might want to take that and build a box. That sort of says, to find the square root of X. And that might be a whole complicated set of rules. And that might end up being a kind of thing where I can put in, say, 36 and say, what's the square root of 36? And out comes six.

And the important thing is that I'd like to design that so that if George comes along and would like to compute, say, the square root of A plus the square root of B, he can take this thing and use it as a module without having to look inside and build something that looks like this, like an A and a B and a square root box and another square root box and then something that adds that would put out the answer.

And you can see, just from the fact that I want to do that, is from George's point of view, the internals of what's in here should not be important. So for instance, it shouldn't matter that, when I wrote this, I said I want to find the square root of X. I could have said the square root of Y, or the square root of A, or anything at all. That's the fundamental notion of putting something in a box using black-box abstraction to suppress detail.

And the reason for that is you want to go off and build bigger boxes. Now, there's another reason for doing black-box abstraction other than you want to suppress detail for building bigger boxes. Sometimes you want to say that your way of doing something, your how-to method, is an instance of a more general thing, and you'd like your language to be able to express that generality.

Let me show you another example sticking with square roots. Let's go back and take another look at that slide with the square root algorithm on it. Remember what that says. That says, in order to do something, I make a guess, and I improve that guess, and I sort of keep improving that guess. So there's the general strategy of, I'm looking for something, and the way I find it is that I keep improving it. Now, that's a particular case of another kind of strategy for finding a fixed point of something.

So you have a fixed point of a function. A fixed point of a function is something, is a value. A fixed point of a function F is a value Y, such that F of Y equals Y. And the way I might do that is start with a guess. And then if I want something that doesn't change when I keep applying F, is I'll keep applying F over and over until that result doesn't change very much. So there's a general strategy.

And then, for example, to compute the square root of X, I can try and find a fixed point of the function which takes Y to the average of X/Y. And the idea that is that if I really had Y equal to the square root of X, then Y and X/Y would be the same value. They'd both be the square root of X, because X over the square root of X is the square root of X.

And so the average if Y were equal to the square of X, then the average wouldn't change. So the square root of X is a fixed point of that particular function. Now, what I'd like to have, I'd like to express the general strategy for finding fixed points. So what I might imagine doing, is to find, is to be able to use my language to define a box that says "fixed point," just like I could make a box that says "square root." And I'd like to be able to express this in my language.

So I'd like to express not only the imperative how-to knowledge of a particular thing like square root, but I'd like to be able to express the imperative knowledge of how to do a general thing like how to find fixed point. And in fact, let's go back and look at that slide again.

See, not only is this a piece of imperative knowledge, how to find a fixed point, but over here on the bottom, there's another piece of imperative knowledge which says, one way to compute square root is to apply this general fixed point method. So I'd like to also be able to express that imperative knowledge. What would that look like?

That would say, this fixed point box is such that if I input to it the function that takes Y to the average of Y and X/Y, then what should come out of that fixed point box is a method for finding square roots. So in these boxes we're building, we're not only building boxes that you input numbers and output numbers, we're going to be building in boxes that, in effect, compute methods like finding square root.

And my take is their inputs functions, like Y goes to the average of Y and X/Y. The reason we want to do that, the reason this is a procedure, will end up being a procedure, as we'll see, whose value is another procedure, the reason we want to do that is because procedures are going to be our ways of talking about imperative knowledge. And the way to make that very powerful is to be able to talk about other kinds of knowledge.

So here is a procedure that, in effect, talks about another procedure, a general strategy that itself talks about general strategies. Well, our first topic in this course-- there'll be three major topics-- will be black-box abstraction. Let's look at that in a little bit more detail. What we're going to do is we will start out talking about how Lisp is built up out of primitive objects. What does the language supply with us? And we'll see that there are primitive procedures and primitive data.

Then we're going to see, how do you take those primitives and combine them to make more complicated things, means of combination? And what we'll see is that there are ways of putting things together, putting primitive procedures together to make more complicated procedures. And we'll see how to put primitive data together to make compound data.

Then we'll say, well, having made those compounds things, how do you abstract them? How do you put those black boxes around them so you can use them as components in more complex things? And we'll see that's done by defining procedures and a technique for dealing with compound data called data abstraction.

And then, what's maybe the most important thing, is going from just the rules to how does an expert work? How do you express common patterns of doing things, like saying, well, there's a general method of fixed point and square root is a particular case of that? And we're going to use-- I've already hinted at it-- something called higher-order procedures, namely procedures whose inputs and outputs are themselves procedures. And then we'll also see something very interesting. We'll see, as we go further and further on and become more abstract, there'll be very-- well, the line between what we consider to be data and what we consider to be procedures is going to blur at an incredible rate.

Well, that's our first subject, black-box abstraction. Let's look at the second topic. I can introduce it like this. See, suppose I want to express the idea-- remember, we're talking about ideas-- suppose I want to express the idea that I can take something and multiply it by the sum of two other things. So for example, I might say, if I had one and three and multiply that by two, I get eight. But I'm talking about the general idea of what's called linear combination, that you can add two things and multiply them by something else.

It's very easy when I think about it for numbers, but suppose I also want to use that same idea to think about, I could add two vectors, a1 and a2, and then scale them by some factor x and get another vector. Or I might say, I want to think about a1 and a2 as being polynomials, and I might want to add those two polynomials and then multiply them by two to get a more complicated one.

Or a1 and a2 might be electrical signals, and I might want to think about summing those two electrical signals and then putting the whole thing through an amplifier, multiplying it by some factor of two or something. The idea is I want to think about the general notion of that.

Now, if our language is going to be good language for expressing those kind of general ideas, if I really, really can do that, I'd like to be able to say I'm going to multiply by x the sum of a1 and a2, and I'd like that to express the general idea of all different kinds of things that a1 and a2 could be. Now, if you think about that, there's a problem, because after all, the actual primitive operations that go on in the machine are obviously going to be different if I'm adding two numbers than if I'm adding two polynomials, or if I'm adding the representation of two electrical signals or wave forms.

Somewhere, there has to be the knowledge of the kinds of various things that you can add and the ways of adding them. Now, to construct such a system, the question is, where do I put that knowledge? How do I think about the different kinds of choices I have? And if tomorrow George comes up with a new kind of object that might be added and multiplied, how do I add George's new object to the system without screwing up everything that was already there?

Well, that's going to be the second big topic, the way of controlling that kind of complexity. And the way you do that is by establishing conventional interfaces, agreed upon ways of plugging things together. Just like in electrical engineering, people have standard impedances for connectors, and then you know if you build something with one of those standard impedances, you can plug it together with something else.

So that's going to be our second large topic, conventional interfaces. What we're going to see is, first, we're going to talk about the problem of generic operations, which is the one I alluded to, things like "plus" that have to work with all different kinds of data. So we talk about generic operations. Then we're going to talk about really large-scale structures. How do you put together very large programs that model the kinds of complex systems in the real world that you'd like to model?

And what we're going to see is that there are two very important metaphors for putting together such systems. One is called object-oriented programming, where you sort of think of your system as a kind of society full of little things that interact by sending information between them. And then the second one is operations on aggregates, called streams, where you think of a large system put together kind of like a signal processing engineer puts together a large electrical system.

That's going to be our second topic. Now, the third thing we're going to come to, the third basic technique for controlling complexity, is making new languages. Because sometimes, when you're sort of overwhelmed by the complexity of a design, the way that you control that complexity is to pick a new design language. And the purpose of the new design language will be to highlight different aspects of the system. It will suppress some kinds of details and emphasize other kinds of details.

This is going to be the most magical part of the course. We're going to start out by actually looking at the technology for building new computer languages. The first thing we're going to do is actually build in Lisp. We're going to express in Lisp the process of interpreting Lisp itself. And that's going to be a very sort of self-circular thing. There's a little mystical symbol that has to do with that. The process of interpreting Lisp is sort of a giant wheel of two processes, apply and eval, which sort of constantly reduce expressions to each other.

Then we're going to see all sorts of other magical things. Here's another magical symbol. This is sort of the Y operator, which is, in some sense, the expression of infinity inside our procedural language. We'll take a look at that. In any case, this section of the course is called Metalinguistic Abstraction, abstracting by talking about how you construct new languages.

As I said, we're going to start out by looking at the process of interpretation. We're going to look at this apply-eval loop, and build Lisp. Then, just to show you that this is very general, we're going to use exactly the same technology to build a very different kind of language, a so-called logic programming language, where you don't really talk about procedures at all that have inputs and outputs. What you do is talk about relations between things.

And then finally, we're going to talk about how you implement these things very concretely on the very simplest kind of machines. We'll see something like this. This is a picture of a chip, which is the Lisp interpreter that we will be talking about then in hardware. Well, there's an outline of the course, three big topics. Black-box abstraction, conventional interfaces, metalinguistic abstraction. Now, let's take a break now and then we'll get started.

[MUSIC PLAYING]

Let's actually start in learning Lisp now. Actually, we'll start out by learning something much more important, maybe the very most important thing in this course, which is not Lisp, in particular, of course, but rather a general framework for thinking about languages that I already alluded to. When somebody tells you they're going to show you a language, what you should say is, what I'd like you to tell me is what are the primitive elements? What does the language come with?

Then, what are the ways you put those together? What are the means of combination? What are the things that allow you to take these primitive elements and build bigger things out of them? What are the ways of putting things together?

And then, what are the means of abstraction? How do we take those complicated things and draw those boxes around them? How do we name them so that we can now use them as if they were primitive elements in making still more complex things? And so on, and so on, and so on. So when someone says to you, gee, I have a great new computer language, you don't say, how many characters does it take to invert a matrix? It's irrelevant.

What you say is, if the language did not come with matrices built in or with something else built in, how could I then build that thing? What are the means of combination which would allow me to do that? And then, what are the means of abstraction which allow me then to use those as elements in making more complicated things yet?

Well, we're going to see that Lisp has some primitive data and some primitive procedures. In fact, let's really start. And here's a piece of primitive data in Lisp, number three. Actually, if I'm being very pedantic, that's not the number three. That's some symbol that represents Plato's concept of the number three. And here's another. Here's some more primitive data in Lisp, 17.4. Or actually, some representation of 17.4.

And here's another one, five. Here's another primitive object that's built in Lisp, addition. Actually, to use the same kind of pedantic-- this is a name for the primitive method of adding things. Just like this is a name for Plato's number three, this is a name for Plato's concept of how you add things. So those are some primitive elements. I can put them together. I can say, gee, what's the sum of three and 17.4 and five?

And the way I do that is to say, let's apply the sum operator to these three numbers. And I should get, what? eight, 17. 25.4. So I should be able to ask Lisp what the value of this is, and it will return 25.4. Let's introduce some names. This thing that I typed is called a combination. And a combination consists, in general, of applying an operator-- so this is an operator-- to some operands. These are the operands.

And of course, I can make more complex things. The reason I can get complexity out of this is because the operands themselves, in general, can be combinations. So for instance, I could say, what is the sum of three and the product of five and six and eight and two? And I should get-- let's see-- 30, 40, 43. So Lisp should tell me that that's 43.

Forming combinations is the basic needs of combination that we'll be looking at. And then, well, you see some syntax here. Lisp uses what's called prefix notation, which means that the operator is written to the left of the operands. It's just a convention. And notice, it's fully parenthesized. And the parentheses make it completely unambiguous. So by looking at this, I can see that there's the operator, and there are one, two, three, four operands.

And I can see that the second operand here is itself some combination that has one operator and two operands. Parentheses in Lisp are a little bit, or are very unlike parentheses in conventional mathematics. In mathematics, we sort of use them to mean grouping, and it sort of doesn't hurt if sometimes you leave out parentheses if people understand that that's a group. And in general, it doesn't hurt if you put in extra parentheses, because that maybe makes the grouping more distinct.

Lisp is not like that. In Lisp, you cannot leave out parentheses, and you cannot put in extra parentheses, because putting in parentheses always means, exactly and precisely, this is a combination which has meaning, applying operators to operands. And if I left this out, if I left those parentheses out, it would mean something else.

In fact, the way to think about this, is really what I'm doing when I write something like this is writing a tree. So this combination is a tree that has a plus and then a thee and then a something else and an eight and a two. And then this something else here is itself a little subtree that has a star and a five and a six.

And the way to think of that is, really, what's going on are we're writing these trees, and parentheses are just a way to write this two-dimensional structure as a linear character string. Because at least when Lisp first started and people had teletypes or punch cards or whatever, this was more convenient. Maybe if Lisp started today, the syntax of Lisp would look like that.

Well, let's look at what that actually looks like on the computer. Here I have a Lisp interaction set up. There's a editor. And on the top, I'm going to type some values and ask Lisp what they are. So for instance, I can say to Lisp, what's the value of that symbol? That's three. And I ask Lisp to evaluate it. And there you see Lisp has returned on the bottom, and said, oh yeah, that's three.

Or I can say, what's the sum of three and four and eight? What's that combination? And ask Lisp to evaluate it. That's 15. Or I can type in something more complicated. I can say, what's the sum of the product of three and the sum of seven and 19.5? And you'll notice here that Lisp has something built in that helps me keep track of all these parentheses. Watch as I type the next closed parentheses, which is going to close the combination starting with the star. The opening one will flash.

Here, I'll rub those out and do it again. Type close, and you see that closes the plus. Close again, that closes the star. Now I'm back to the sum, and maybe I'm going to add that all to four. That closes the plus. Now I have a complete combination, and I can ask Lisp for the value of that.

That kind of paren balancing is something that's built into a lot of Lisp systems to help you keep track, because it is kind of hard just by hand doing all these parentheses. There's another kind of convention for keeping track of parentheses. Let me write another complicated combination. Let's take the sum of the product of three and five and add that to something.

And now what I'm going to do is I'm going to indent so that the operands are written vertically. Which the sum of that and the product of 47 and-- let's say the product of 47 with a difference of 20 and 6.8. That means subtract 6.8 from 20. And then you see the parentheses close. Close the minus. Close the star.

And now let's get another operator. You see the Lisp editor here is indenting to the right position automatically to help me keep track. I'll do that again. I'll close that last parentheses again. You see it balances the plus. Now I can say, what's the value of that?

So those two things, indenting to the right level, which is called pretty printing, and flashing parentheses, are two things that a lot of Lisp systems have built in to help you keep track. And you should learn how to use them. Well, those are the primitives. There's a means of combination. Now let's go up to the means of abstraction.

I'd like to be able to take the idea that I do some combination like this, and abstract it and give it a simple name, so I can use that as an element. And I do that in Lisp with "define." So I can say, for example, define A to be the product of five and five. And now I could say, for example, to Lisp, what is the product of A and A? And this should be 25, and this should be 625.

And then, crucial thing, I can now use A-- here I've used it in a combination-- but I could use that in other more complicated things that I name in turn. So I could say, define B to be the sum of, we'll say, A and the product of five and A. And then close the plus.

Let's take a look at that on the computer and see how that looks. So I'll just type what I wrote on the board. I could say, define A to be the product of five and five. And I'll tell that to Lisp. And notice what Lisp responded there with was an A in the bottom. In general, when you type in a definition in Lisp, it responds with the symbol being defined.

Now I could say to Lisp, what is the product of A and A? And it says that's 625. I can define B to be the sum of A and the product of five and A. Close a paren closes the star. Close the plus. Close the "define." Lisp says, OK, B, there on the bottom. And now I can say to Lisp, what's the value of B?

And I can say something more complicated, like what's the sum of A and the quotient of B and five? That slash is divide, another primitive operator. I've divided B by five, added it to A. Lisp says, OK, that's 55.

So there's what it looks like. There's the basic means of defining something. It's the simplest kind of naming, but it's not really very powerful. See, what I'd really like to name-- remember, we're talking about general methods-- I'd like to name, oh, the general idea that, for example, I could multiply five by five, or six by six, or 1,001 by 1,001, 1,001.7 by 1,001.7. I'd like to be able to name the general idea of multiplying something by itself.

Well, you know what that is. That's called squaring. And the way I can do that in Lisp is I can say, define to square something x, multiply x by itself. And then having done that, I could say to Lisp, for example, what's the square of 10? And Lisp will say 100.

So now let's actually look at that a little more closely. Right, there's the definition of square. To square something, multiply it by itself. You see this x here. That x is kind of a pronoun, which is the something that I'm going to square. And what I do with it is I multiply x, I multiply it by itself.

OK. So there's the notation for defining a procedure. Actually, this is a little bit confusing, because this is sort of how I might use square. And I say square root of x or square root of 10, but it's not making it very clear that I'm actually naming something. So let me write this definition in another way that makes it a little bit more clear that I'm naming something. I'll say, "define" square to be lambda of x times xx.

Here, I'm naming something square, just like over here, I'm naming something A. The thing that I'm naming square-- here, the thing I named A was the value of this combination. Here, the thing that I'm naming square is this thing that begins with lambda, and lambda is Lisp's way of saying make a procedure.

Let's look at that more closely on the slide. The way I read that definition is to say, I define square to be make a procedure-- that's what the lambda is-- make a procedure with an argument named x. And what it does is return the results of multiplying x by itself. Now, in general, we're going to be using this top form of defining, just because it's a little bit more convenient. But don't lose sight of the fact that it's really this.

In fact, as far as the Lisp interpreter's concerned, there's no difference between typing this to it and typing this to it. And there's a word for that, sort of syntactic sugar. What syntactic sugar means, it's having somewhat more convenient surface forms for typing something.

So this is just really syntactic sugar for this underlying Greek thing with the lambda. And the reason you should remember that is don't forget that, when I write something like this, I'm really naming something. I'm naming something square, and the something that I'm naming square is a procedure that's getting constructed.

Well, let's look at that on the computer, too. So I'll come and I'll say, define square of x to be times xx. Now I'll tell Lisp that. It says "square." See, I've named something "square." Now, having done that, I can ask Lisp for, what's the square of 1,001? Or in general, I could say, what's the square of the sum of five and seven? The square of 12's 144.

Or I can use square itself as an element in some combination. I can say, what's the sum of the square of three and the square of four? nine and 16 is 25. Or I can use square as an element in some much more complicated thing. I can say, what's the square of, the sqare of, the square of 1,001?

And there's the square of the square of the square of 1,001. Or I can say to Lisp, what is square itself? What's the value of that? And Lisp returns some conventional way of telling me that that's a procedure. It says, "compound procedure square." Remember, the value of square is this procedure, and the thing with the stars and the brackets are just Lisp's conventional way of describing that.

Let's look at two more examples of defining. Here are two more procedures. I can define the average of x and y to be the sum of x and y divided by two. Or having had average and mean square, having had average and square, I can use that to talk about the mean square of something, which is the average of the square of x and the square of y.

So for example, having done that, I could say, what's the mean square of two and three? And I should get the average of four and nine, which is 6.5. The key thing here is that, having defined square, I can use it as if it were primitive. So if we look here on the slide, if I look at mean square, the person defining mean square doesn't have to know, at this point, whether square was something built into the language or whether it was a procedure that was defined.

And that's a key thing in Lisp, that you do not make arbitrary distinctions between things that happen to be primitive in the language and things that happen to be built in. A person using that shouldn't even have to know. So the things you construct get used with all the power and flexibility as if they were primitives. In fact, you can drive that home by looking on the computer one more time.

We talked about plus. And in fact, if I come here on the computer screen and say, what is the value of plus? Notice what Lisp types out. On the bottom there, it typed out, "compound procedure plus." Because, in this system, it turns out that the addition operator is itself a compound procedure. And if I didn't just type that in, you'd never know that, and it wouldn't make any difference anyway. We don't care. It's below the level of the abstraction that we're dealing with.

So the key thing is you cannot tell, should not be able to tell, in general, the difference between things that are built in and things that are compound. Why is that? Because the things that are compound have an abstraction wrapper wrapped around them. We've seen almost all the elements of Lisp now. There's only one more we have to look at, and that is how to make a case analysis.

Let me show you what I mean. We might want to think about the mathematical definition of the absolute value functions. I might say the absolute value of x is the function which has the property that it's negative of x. For x less than zero, it's zero for x equal to zero. And it's x for x greater than zero. And Lisp has a way of making case analyses.

Let me define for you absolute value. Say define the absolute value of x is conditional. This means case analysis, COND. If x is less than zero, the answer is negate x. What I've written here is a clause. This whole thing is a conditional clause, and it has two parts. This part here is a predicate or a condition.

That's a condition. And the condition is expressed by something called a predicate, and a predicate in Lisp is some sort of thing that returns either true or false. And you see Lisp has a primitive procedure, less-than, that tests whether something is true or false.

And the other part of a clause is an action or a thing to do, in the case where that's true. And here, what I'm doing is negating x. The negation operator, the minus sign in Lisp is a little bit funny. If there's two or more arguments, if there's two arguments it subtracts the second one from the first, and we saw that. And if there's one argument, it negates it. So this corresponds to that.

And then there's another COND clause. It says, in the case where x is equal to zero, the answer is zero. And in the case where x is greater than zero, the answer is x. Close that clause. Close the COND. Close the definition. And there's the definition of absolute value. And you see it's the case analysis that looks very much like the case analysis you use in mathematics.

There's a somewhat different way of writing a restricted case analysis. Often, you have a case analysis where you only have one case, where you test something, and then depending on whether it's true or false, you do something. And here's another definition of absolute value which looks almost the same, which says, if x is less than zero, the result is negate x. Otherwise, the answer is x. And we'll be using "if" a lot.

But again, the thing to remember is that this form of absolute value that you're looking at here, and then this one over here that I wrote on the board, are essentially the same. And "if" and COND are-- well, whichever way you like it. You can think of COND as syntactic sugar for "if," or you can think of "if" as syntactic sugar for COND, and it doesn't make any difference. The person implementing a Lisp system will pick one and implement the other in terms of that. And it doesn't matter which one you pick.

Why don't we break now, and then take some questions. How come sometimes when I write define, I put an open paren here and say, define open paren something or other, and sometimes when I write this, I don't put an open paren? The answer is, this particular form of "define," where you say define some expression, is this very special thing for defining procedures. But again, what it really means is I'm defining this symbol, square, to be that.

So the way you should think about it is what "define" does is you write "define," and the second thing you write is the symbol here-- no open paren-- the symbol you're defining and what you're defining it to be. That's like here and like here. That's sort of the basic way you use "define." And then, there's this special syntactic trick which allows you to define procedures that look like this. So the difference is, it's whether or not you're defining a procedure.

[MUSIC PLAYING]

Well, believe it or not, you actually now know enough Lisp to write essentially any numerical procedure that you'd write in a language like FORTRAN or Basic or whatever, or, essentially, any other language. And you're probably saying, that's not believable, because you know that these languages have things like "for statements," and "do until while" or something.

But we don't really need any of that. In fact, we're not going to use any of that in this course. Let me show you. Again, looking back at square root, let's go back to this square root algorithm of Heron of Alexandria. Remember what that said. It said, to find an approximation to the square root of X, you make a guess, you improve that guess by averaging the guess and X over the guess. You keep improving that until the guess is good enough. I already alluded to the idea. The idea is that, if the initial guess that you took was actually equal to the square root of X, then G here would be equal to X/G.

So if you hit the square root, averaging them wouldn't change it. If the G that you picked was larger than the square root of X, then X/G will be smaller than the square root of X, so that when you average G and X/G, you get something in between. So if you pick a G that's too small, your answer will be too large. If you pick a G that's too large, if your G is larger than the square root of X and X/G will be smaller than the square root of X.

So averaging always gives you something in between. And then, it's not quite trivial, but it's possible to show that, in fact, if G misses the square root of X by a little bit, the average of G and X/G will actually keep getting closer to the square root of X. So if you keep doing this enough, you'll eventually get as close as you want.

And then there's another fact, that you can always start out this process by using 1 as an initial guess. And it'll always converge to the square root of X. So that's this method of successive averaging due to Heron of Alexandria. Let's write it in Lisp.

Well, the central idea is, what does it mean to try a guess for the square root of X? Let's write that. So we'll say, define to try a guess for the square root of X, what do we do? We'll say, if the guess is good enough to be a guess for the square root of X, then, as an answer, we'll take the guess. Otherwise, we will try the improved guess. We'll improve that guess for the square root of X, and we'll try that as a guess for the square root of X. Close the "try." Close the "if." Close the "define." So that's how we try a guess.

And then, the next part of the process said, in order to compute square roots, we'll say, define to compute the square root of X, we will try one as a guess for the square root of X. Well, we have to define a couple more things. We have to say, how is a guess good enough? And how do we improve a guess? So let's look at that.

The algorithm to improve a guess for the square root of X, we average-- that was the algorithm-- we average the guess with the quotient of dividing X by the guess. That's how we improve a guess. And to tell whether a guess is good enough, well, we have to decide something. This is supposed to be a guess for the square root of X, so one possible thing you can do is say, when you take that guess and square it, do you get something very close to X? So one way to say that is to say, I square the guess, subtract X from that, and see if the absolute value of that whole thing is less than some small number, which depends on my purposes.

So there's a complete procedure for how to compute the square root of X. Let's look at the structure of that a little bit. I have the whole thing. I have the notion of how to compute a square root. That's some kind of module. That's some kind of black box. It's defined in terms of how to try a guess for the square root of X.

"Try" is defined in terms of, well, telling whether something is good enough and telling how to improve something. So good enough. "Try" is defined in terms of "good enough" and "improve." And let's see what else I fill in. Well, I'll go down this tree. "Good enough" was defined in terms of absolute value, and square. And improve was defined in terms of something called averaging and then some other primitive operator.

Square root's defined in terms of "try." "Try" is defined in terms of "good enough" and "improve," but also "try" itself. So "try" is also defined in terms of how to try itself. Well, that may give you some problems. Your high school geometry teacher probably told you that it's naughty to try and define things in terms of themselves, because it doesn't make sense. But that's false.

Sometimes it makes perfect sense to define things in terms of themselves. And this is the case. And we can look at that. We could write down what this means, and say, suppose I asked Lisp what the square root of two is. What's the square root of two mean? Well, that means I try one as a guess for the square root of two.

Now I look. I say, gee, is one a good enough guess for the square root of two? And that depends on the test that "good enough" does. And in this case, "good enough" will say, no, one is not a good enough guess for the square root of two. So that will reduce to saying, I have to try an improved-- improve one as a guess for the square root of two, and try that as a guess for the square root of two. Improving one as a guess for the square root of two means I average one and two divided by one. So this is going to be average. This piece here will be the average of one and the quotient of two by one. That's this piece here.

And this is 1.5. So this square root of two reduces to trying one for the square root of two, which reduces to trying 1.5 as a guess for the square root of two. So that makes sense. Let's look at the rest of the process. If I try 1.5, that reduces. 1.5 turns out to be not good enough as a guess for the square root of two. So that reduces to trying the average of 1.5 and two divided by 1.5 as a guess for the square root of two.

That average turns out to be 1.333. So this whole thing reduces to trying 1.333 as a guess for the square root of two. And then so on. That reduces to another called a "good enough," 1.4 something or other. And then it keeps going until the process finally stops with something that "good enough" thinks is good enough, which, in this case, is 1.4142 something or other.

So the process makes perfect sense. This, by the way, is called a recursive definition. And the ability to make recursive definitions is a source of incredible power. And as you can already see I've hinted at, it's the thing that effectively allows you to do these infinite computations that go on until something is true, without having any other constricts other than the ability to call a procedure.

Well, let's see, there's one more thing. Let me show you a variant of this definition of square root here on the slide. Here's sort of the same thing. What I've done here is packaged the definitions of "improve" and "good enough" and "try" inside "square root." So, in effect, what I've done is I've built a square root box. So I've built a box that's the square root procedure that someone can use. They might put in 36 and get out six. And then, packaged inside this box are the definitions of "try" and "good enough" and "improve."

So they're hidden inside this box. And the reason for doing that is that, if someone's using this square root, if George is using this square root, George probably doesn't care very much that, when I implemented square root, I had things inside there called "try" and "good enough" and "improve." And in fact, Harry might have a cube root procedure that has "try" and "good enough" and "improve." And in order to not get the whole system confused, it'd be good for Harry to package his internal procedures inside his cube root procedure.

Well, this is called block structure, this particular way of packaging internals inside of a definition. And let's go back and look at the slide again. The way to read this kind of procedure is to say, to define "square root," well, inside that definition, I'll have the definition of an "improve" and the definition of "good enough" and the definition of "try." And then, subject to those definitions, the way I do square root is to try one.

And notice here, I don't have to say one as a guess for the square root of X, because since it's all inside the square root, it sort of has this X known.

Let me summarize. We started out with the idea that what we're going to be doing is expressing imperative knowledge. And in fact, here's a slide that summarizes the way we looked at Lisp. We started out by looking at some primitive elements in addition and multiplication, some predicates for testing whether something is less-than or something's equal.

And in fact, we saw really sneakily in the system we're actually using, these aren't actually primitives, but it doesn't matter. What matters is we're going to use them as if they're primitives. We're not going to look inside. We also have some primitive data and some numbers. We saw some means of composition, means of combination, the basic one being composing functions and building combinations with operators and operands.

And there were some other things, like COND and "if" and "define." But the main thing about "define," in particular, was that it was the means of abstraction. It was the way that we name things. You can also see from this slide not only where we've been, but holes we have to fill in. At some point, we'll have to talk about how you combine primitive data to get compound data, and how you abstract data so you can use large globs of data as if they were primitive. So that's where we're going.

But before we do that, for the next couple of lectures we're going to be talking about, first of all, how it is that you make a link between these procedures we write and the processes that happen in the machine. And then, how it is that you start using the power of Lisp to talk not only about these individual little computations, but about general conventional methods of doing things.

OK, are there any questions?

AUDIENCE: Yes. If we defined A using parentheses instead of as we did, what would be the difference?

PROFESSOR: If I wrote this, if I wrote that, what I would be doing is defining a procedure named A. In this case, a procedure of no arguments, which, when I ran it, would give me back five times five.

AUDIENCE: Right. I mean, you come up with the same thing, except for you really got a different--

PROFESSOR: Right. And the difference would be, in the old one-- Let me be a little bit clearer here. Let's call this A, like here. And pretend here, just for contrast, I wrote, define D to be the product of five and five. And the difference between those, let's think about interactions with the Lisp interpreter. I could type in A and Lisp would return 25. I could type in D, if I just typed in D, Lisp would return compound procedure D, because that's what it is. It's a procedure.

I could run D. I could say, what's the value of running D? Here is a combination with no operands. I see there are no operands. I didn't put any after D. And it would say, oh, that's 25. Or I could say, just for completeness, if I typed in, what's the value of running A? I get an error. The error would be the same one as over there. It'd be the error would say, sorry, 25, which is the value of A, is not an operator that I can apply to something.