Flash and JavaScript are required for this feature.
Download the video from Internet Archive.
Topics covered: Higher-order Procedures
Instructors: Hal Abelson and Gerald Jay Sussman
Subtitles for this course are provided through the generous assistance of Henry Baker, Hoofar Pourzand, Heather Wood, Aleksejs Truhans, Steven Edwards, George Menhorn, and Mahendra Kumar.
2A: Higher-order Procedures
PROFESSOR: Well, yesterday was easy. You learned all of the rules of programming and lived. Almost all of them. And so at this point, you're now certified programmers-- it says. However, I suppose what we did is we, aah, sort of got you a little bit of into an easy state. Here, you still believe it's possible that this might be programming in BASIC or Pascal with just a funny syntax. Today, that illusion-- or you can no longer support that belief. What we're going to do today is going to completely smash that.
So let's start out by writing a few programs on the blackboard that have a lot in common with each other. What we're going to do is try to make them abstractions that are not ones that are easy to make in most languages. Let's start with some very simple ones that you can make in most languages.
Supposing I want to write the mathematical expression which adds up a bunch of integers. So if I wanted to write down and say the sum from i equal a to b on i. Now, you know that that's an easy thing to compute in a closed form for it, and I'm not interested in that. But I'm going to write a program that adds up those integers.
Well, that's rather easy to do to say I want to define the sum of the integers from a to b to be-- well, it's the following two possibilities. If a is greater than b, well, then there's nothing to be done and the answer is zero. This is how you're going to have to think recursively. You're going to say if I have an easy case that I know the answer to, just write it down. Otherwise, I'm going to try to reduce this problem to a simpler problem. And maybe in this case, I'm going to make a subproblem of the simpler problem and then do something to the result. So the easiest way to do this is say that I'm going to add the index, which in this case is a, to the result of adding up the integers from a plus 1 to b.
Now, at this point, you should have no trouble looking at such a definition. Indeed, coming up with such a thing might be a little hard in synthesis, but being able to read it at this point should be easy. And what it says to you is, well, here is the subproblem I'm going to solve. I'm going to try to add up the integers, one fewer integer than I added up for the the whole problem. I'm adding up the one fewer one, and that subproblem, once I've solved it, I'm going to add a to that, and that will be the answer to this problem. And the simplest case, I don't have to do any work.
Now, I'm also going to write down another simple one just like this, which is the mathematical expression, the sum of the square from i equal a to b. And again, it's a very simple program. And indeed, it starts the same way. If a is greater than b, then the answer is zero. And, of course, we're beginning to see that there's something wrong with me writing this down again. It's the same program. It's the sum of the square of a and the sum of the square of the increment and b.
Now, if you look at these things, these programs are almost identical. There's not much to distinguish them. They have the same first clause of the conditional and the same predicate and the same consequence, and the alternatives are very similar, too. They only differ by the fact that where here I have a, here, I have the square of a. The only other difference, but this one's sort of unessential is in the name of this procedure is sum int, whereas the name of the procedure is sum square. So the things that vary between these two are very small.
Now, wherever you see yourself writing the same thing down more than once, there's something wrong, and you shouldn't be doing it. And the reason is not because it's a waste of time to write something down more than once. It's because there's some idea here, a very simple idea, which has to do with the sigma notation-- this much-- not depending upon what it is I'm adding up. And I would like to be able to-- always, whenever trying to make complicated systems and understand them, it's crucial to divide the things up into as many pieces as I can, each of which I understand separately. I would like to understand the way of adding things up independently of what it is I'm adding up so I can do that having debugged it once and understood it once and having been able to share that among many different uses of it.
Here, we have another example. This is Leibnitz's formula for finding pi over 8. It's a funny, ugly mess. What is it? It's something like 1 over 1 times 3 plus 1 over 5 times 7 plus 1 over 9 times 11 plus-- and for some reason, things like this tend to have interesting values like pi over 8. But what do we see here? It's the same program or almost the same program. It's a sum. So we're seeing the figure notation, although over here, we're dealing with incrementing by 4, so it's a slightly different problem, which means that over here, I have to change a by 4, as you see right over here. It's not by 1.
The other thing, of course, is that the thing that's represented by square in the previous sum of squares, or a when adding up the integers. Well, here, I have a different thing I'm adding up, a different term, which is 1 over a times a plus 2. But the rest of this program is identical. Well, any time we have a bunch of things like this that are identical, we're going to have to come up with some sort of abstraction to cover them.
If you think about this, what you've learned so far is the rules of some language, some primitive, some means of combination, almost all of them, the means of abstraction, almost all of them. But what you haven't learned is common patterns of usage.
Now, most of the time, you learn idioms when learning a language, which is a common pattern that mean things that are useful to know in a flash. And if you build a great number of them, if you're a FORTRAN programmer, of course, everybody knows how to-- what do you do, for example, to get an integer which is the biggest integer in something. It's a classic thing. Every FORTRAN programmer knows how to do that. And if you don't know that, you're in real hot water because it takes a long time to think it out. However, one of the things you can do in this language that we're showing you is not only do you know something like that, but you give the knowledge of that a name. And so that's what we're going to be going after right now.
OK, well, let's see what these things have in common. Right over here we have what appears to be a general pattern, a general pattern which covers all of the cases we've seen so far. There is a sum procedure, which is being defined. It has two arguments, which are a lower bound and an upper bound. The lower bound is tested to be greater than the upper bound, and if it is greater, then the result is zero. Otherwise, we're going to do something to the lower bound, which is the index of the conversation, and add that result to the result of following the procedure recursively on our lower bound incremented by some next operation with the same upper bound as I had before.
So this is a general pattern, and what I'd like to do is be able to name this general pattern a bit. Well, that's sort of easy, because one of the things I'm going to do right now is-- there's nothing very special about numbers. Numbers are just one kind of data. It seems to me perfectly reasonable to give all sorts of names to all kinds of data, for example, procedures. And now many languages allow you have procedural arguments, and right now, we're going to talk about procedural arguments. They're very easy to deal with. And shortly, we'll do some remarkable things that are not like procedural arguments.
So here, we'll define our sigma notation. This is called sum and it takes a term, an A, a next term, and B as arguments. So it takes four arguments, and there was nothing particularly special about me writing this in lowercase. I hope that it doesn't confuse you, so I'll write it in uppercase right now. The machine doesn't care.
But these two arguments are different. These are not numbers. These are going to be procedures for computing something given a number. Term will be a procedure which, when given an index, will produce the value of the term for that index. Next will be given an index, which will produce the next index. This will be for counting. And it's very simple. It's exactly what you see. If A is greater than B, then the result is 0. Otherwise, it's the sum of term applied to A and the sum of term, next index.
Let me write it this way. Now, I'd like you to see something, first of all. I was writing here, and I ran out of space. What I did is I start indenting according to the Pretty-printing rule, which says that I align all of the arguments of the procedure so I can see which ones go together. And this is just something I do automatically, and I want you to learn how to do that, too, so your programs can be read and understood.
However, what do we have here? We have four arguments: the procedure, the lower index-- lower bound index-- the way to get the next index, and the upper bound. What's passed along on the recursive call is indeed the same procedure because I'm going to need it again, the next index, which is using the next procedure to compute it, the procedure for computing next, which I also have to have separately, and that's different. The procedure for computing next is different from the next index, which is the result of using next on the last index. And I also have to pass along the upper bound. So this captures both of these and the other nice program that we are playing with.
So using this, we can write down the original program as instances of sum very simply. A and B. Well, I'm going to need an identity procedure here because ,ahh, the sum of the integers requires me to in this case compute a term for every integer, but the term procedure doesn't want to do anything to that integer. So the identity procedure on A is A or X or whatever, and I want to say the sum of using identity of the term procedure and using A as the initial index and the incrementer being the way to get the next index and B being the high bound, the upper bound. This procedure does exactly the same as the sum of the integers over here, computes the same answer.
Now, one thing you should see, of course, is that there's nothing very special over here about what I used as the formal parameter. I could have, for example, written this X. It doesn't matter. I just wanted you to see that this name does not conflict with this one at all. It's an internal name.
For the second procedure here, the sum of the squares, it's even a little bit easier. And what do we have to do? Nothing more than add up the squares, this is the procedure that each index will be given, will be given each-- yes. Each index will have this done to it to get the term. That's the thing that maps against term over here. Then I have A as the lower bound, the incrementer as the next term method, and B as the upper bound.
And finally, just for the thing that we did about pi sums, pi sums are sort of-- well, it's even easier to think about them this way because I don't have to think. What I'm doing is separating the thing I'm adding up from the method of doing the addition. And so we have here, for example, pi sum A B of the sum of things. I'm going to write the terms procedure here explicitly without giving it a name. This is done anonymously. I don't necessarily have to give a name to something if I just want to use it once.
And, of course, I can write sort of a expression that produces a procedure. I'm going to write the Greek lambda letter here instead of L-A-M-B-D-A in general to avoid taking up a lot of space on blackboards. But unfortunately, we don't have lambda keys on our keyboards. Maybe we can convince our friends in the computer industry that this is an important. Lambda of i is the quotient of 1 and the product of i and the sum of i 2, starting at a with the way of incrementing being that procedure of an index i, which adds i to 4, and b being the upper bound. So you can see that this notation, the invention of the procedure that takes a procedural argument, allows us to compress a lot of these procedures into one thing. This procedure, sums, covers a whole bunch of ideas.
Now, just why is this important? I tried to say before that it helps us divide a problem into two pieces, and indeed, it does, for example, if someone came up with a different way of implementing this, which, of course, one might. Here, for example, an iterative implementation of sum. Iterative implementation for some reason might be better than the recursive implementation. But the important thing is that it's different.
Now, supposing I had written my program this way that you see on the blackboard on the left. That's correct, the left. Well, then if I want to change the method of addition, then I'd have to change each of these. Whereas if I write them like this that you see here, then the method by which I did the addition is encapsulated in the procedure sum. That decomposition allows me to independently change one part of the program and prove it perhaps without changing the other part that was written for some of the other cases. Thank you. Are there any questions? Yes, sir.
AUDIENCE: Would you go over next A and next again on--
PROFESSOR: Yes. It's the same problem. I'm sure you're going to-- you're going to have to work on this. This is hard the first time you've ever seen something like this.
What I have here is a-- procedures can be named by variables. Procedures are not special. Actually, sum square is a variable, which has gotten a value, which is a procedure. This is define sum square to be lambda of A and B something. So the procedure can be named. Therefore, they can be passed from one to another, one procedure to another, as arguments. Well, what we're doing here is we're passing the procedure term as an argument to sum just when we get it around in the next recursive.
Here, we're passing the procedure next as an argument also. However, here we're using the procedure next. That's what the parentheses mean. We're applying next to A to get the next value of A. If you look at what next is mapped against, remember that the way you think about this is that you substitute the arguments for the formal parameters in the body. If you're ever confused, think of the thing that way.
Well, over here, with sum of the integers. I substitute identity for a term and 1 plus the incrementer for next in the body. Well, the identity procedure on A is what I get here. Identity is being passed along, and here, I have increment 1 plus being applied to A and 1 plus is being passed along. Does that clarify the situation?
AUDIENCE: We could also define explicitly those two functions, then pass them.
PROFESSOR: Sure. What we can do is we could have given names to them, just like I did here. In fact, I gave you various ways so you could see it, a variety. Here, I define the thing which I passed the name of. I referenced it by its name. But the thing is, in fact, that procedure, one argument X, which is X. And the identity procedure is just lambda of X X. And that's what you're seeing here. Here, I happened to just write its canonical name there for you to see. Is it OK if we take our five-minute break?
As I said, computers to make people happy, not people to make computers happy. And for the most part, the reason why we introduce all this abstraction stuff is to make it so that programs can be more easily written and more easily read. Let's try to understand what's the most complicated program we've seen so far using a little bit of this abstraction stuff.
If you look at the slide, this is the Heron of Alexandria's method of computing square roots that we saw yesterday. And let's see. Well, in any case, this program is a little complicated. And at the current state of your thinking, you just can't look at that and say, oh, this obviously means something very clear. It's not obvious from looking at the program what it's computing. There's some loop here inside try, and a loop does something about trying the improvement of y. There's something called improve, which does some averaging and quotienting and things like that. But what's the real idea? Can we make it clear what the idea is? Well, I think we can. I think we can use abstraction that we have learned about so far to clarify what's going on.
Now, what we have mathematically is a procedure for improving a guess for square roots. And if y is a guess for a square root, then what we want to get we'll call a function f. This is the means of improvement. I want to get y plus x/y over 2, so the average of y and x divided by y as the improved value for the square root of x such that-- one thing you can notice about this function f is that f of the square root of f is in fact the square root of x. In other words, if I take the square root of x and substitute it for y here, I see the square root of x plus x divided by the square of x, which is the square root of x. That's 2 times the square root of x divided by 2, is the square root of x.
So, in fact, what we're really looking for is we're looking for a fixed point, a fixed point of the function f. A fixed point is a place which has the property that if you put it into the function, you get the same value out. Now, I suppose if I were giving some nice, boring lecture, and you happened to have in front of you an HP-35 desk calculator like I used to have when I went to boring lectures. And if you think it was really boring, you put it into radians mode, and you hit cosine, and you hit cosine, and you hit cosine. And eventually, you end up with 0.734 or something like that. 0.743, I don't remember what exactly, and it gets closer and closer to that. Some functions have the property that you can find their fixed point by iterating the function, and that's essentially what's happening in the square root program by Heron's method.
So let's see if we can write that down, that idea. Now, I'm not going to say how I compute fixed points yet. There might be more than one way. But the first thing to do is I'm going to say what I just said. I'm going to say it specifically, the square root. The square root of x is the fixed point of that procedure which takes an argument y and averages of x divided by y with y. And we're going to start up with the initial guess for the fixed point of 1. It doesn't matter where it starts. A theorem having to do with square roots.
So what you're seeing here is I'm just trying to write out by wishful thinking. I don't know how I'm going to make fixed point happen. We'll worry about that later. But if somehow I had a way of finding the fixed point of the function computed by this procedure, then I would have-- that would be the square root that I'm looking for.
OK, well, now let's see how we're going to write-- how we're going to come up with fixed points. Well, it's very simple, actually. I'm going to write an abbreviated version here just so we understand it. I'm going to find the fixed point of a function f-- actually, the fixed point of the function computed by the procedure whose name will be f in this procedure. How's that? A long sentence-- starting with a particular starting value.
Well, I'm going to have a little loop inside here, which is going to push the button on the calculator repeatedly, hoping that it will eventually converge. And we will say here internal loops are written by defining internal procedures. Well, one thing I'm going to have to do is I'm going to have to say whether I'm done. And the way I'm going to decide when I'm done is when the old value and the new value are close enough so I can't distinguish them anymore. That's the standard thing you do on the calculator unless you look at more precision, and eventually, you run out of precision.
So the old value and new value, and I'm going to stay here if I can't distinguish them if they're close enough, and we'll have to worry about what that is soon. The old value and the new value are close enough to each other and let's pick the new value as the answer. Otherwise, I'm going to iterate around again with the next value of old being the current value of new and the next value of new being the result of calling f on new. And so this is my iteration loop that pushes the button on the calculator. I basically think of it as having two registers on the calculator: old and new. And in each step, new becomes old, and new gets F of new. So this is the thing where I'm getting the next value.
And now, I'm going to start this thing up by giving two values. I wrote down on the blackboard to be slow so you can see this. This is the first time you've seen something quite this complicated, I think. However, we might want to see the whole thing over here in this transparency or slide or whatever. What we have is all of the details that are required to make this thing work. I have a way of getting a tolerance for a close enough procedure, which we see here. The close enough procedure, it tests whether u and v are close enough by seeing if the absolute value of the difference in u and v is less than the given tolerance, OK? And here is the iteration loop that I just wrote on the blackboard and the initialization for it, which is right there. It's very simple.
But let's see. I haven't told you enough. It's actually easier than this. There is more structure to this problem than I've already told you. Like why should this work? Why should it converge? There's a hairy theorem in mathematics tied up in what I've written here. Why is it that I should assume that by iterating averaging the quotient of x and y and y that I should get the right answer? It isn't so obvious.
Surely there are other things, other procedures, which compute functions whose fixed points would also be the square root. For example, the obvious one will be a new function g, which maps y to x/y. That's even simpler. The fixed point of g is surely the square root also, and it's a simpler procedure.
Why am I not using it? Well, I suppose you know. Supposing x is 2 and I start out with 1, and if I divide 1 into 2, I get 2. And then if I divide 2 into 2, I get 1. If I divide 1 into 2, I get 2, and 2 into 2, I get 1, and I never get any closer to the square root. It just oscillates. So what we have is a signal processing system, an electrical circuit which is oscillating, and I want to damp out these oscillations. Well, I can do that.
See, what I'm really doing here when I'm taking my average, the average is averaging the last two values of something which oscillates, getting something in between. The classic way is damping out oscillations in a signal processing system. So why don't we write down the strategy that I just said in a more clear way? Well, that's easy enough.
I'm going to define the square root of x to be a fixed point of the procedure resulting from average damping. So I have a procedure resulting from average damp of the procedure, that procedure of y, which divides x by y starting out at 1. Ah, but average damp is a special procedure that's going to take a procedure as its argument and return a procedure as its value. It's a generalization that says given a procedure, it's the thing which produces a procedure which averages the last value and the value before and after running the procedure. You can use it for anything if you want to damp out oscillations. So let's write that down. It's very easy.
And stylistically here, I'm going to use lambda notation because it's much easier to think when you're dealing with procedure, the mid-line procedures, to understand that the procedures are the objects I'm dealing with, so I'm going to use lambda notation here. Not always. I don't always use it, but very specifically here to expand on that idea, to elucidate it.
Well, average damp is a procedure, which takes a procedure as its argument, which we will call f. And what does it produce? It produces as its value-- the body of this procedure is a thing which produces a procedure, the construct of the procedures right here, of one argument x, which averages f of x with x.
This is a very special thing. I think for the first time you're seeing a procedure which produces a procedure as its value. This procedure takes the procedure f and does something to it to produce a new procedure of one argument x, which averages f-- this f-- applied to x and x itself. Using the context here, I apply average damping to the procedure, which just divides x by y. It's a division. And I'm finding to fixed point of that, and that's a clearer way of writing down what I wrote down over here, wherever it was. Here, because it tells why I am writing this down.
I suppose this to some extent really clarifies what Heron of Alexandria was up to. I suppose I'll stop now. Are there any questions?
AUDIENCE: So when you define average damp, don't you need to have a variable on f?
PROFESSOR: Ah, the question was, and here we're having-- again, you've got to learn about the syntax. The question was when defining average damp, don't you have to have a variable defined with f? What you are asking about is the formal parameter of f?
AUDIENCE: Yeah.
PROFESSOR: OK. The formal parameter of f is here. The formal parameter of f--
AUDIENCE: The formal parameter of average damp.
PROFESSOR: F is being used to apply it to an argument, right? It's indeed true that f must have a formal parameter. Let's find out what f's formal parameter is.
AUDIENCE: The formal parameter of average damp.
PROFESSOR: Oh, f is the formal parameter of average damp. I'm sorry. You're just confusing a syntactic thing. I could have written this the other way. Actually, I didn't understand your question. Of course, I could have written it this other way. Those are identical notations. This is a different way of writing this. You're going to have to get used to lambda notation because I'm going to use it.
What it says here, I'm defining the name average damp to name the procedure whose of one argument f. That's the formal parameter of the procedure average damp. What define does is it says give this name a value. Here is the value of for it. That there happens to be a funny syntax to make that easier in some cases is purely convenience. But the reason why I wrote it this way here is to emphasize that I'm dealing with a procedure that takes a procedure as its argument and produces a procedure as its value.
AUDIENCE: I don't understand why you use lambda twice. Can you just use one lambda and take two arguments f and x?
PROFESSOR: No.
AUDIENCE: You can't?
PROFESSOR: No, that would be a different thing. If I were to write the procedure lambda of f and x, the average of f of x and x, that would not be something which would be allowed to take a procedure as an argument and produce a procedure as its value. That would be a thing that takes a procedure as its argument and numbers its argument and produces a new number. But what I'm producing here is a procedure to fit in the procedure slot over here, which is going to be used over here. So the number has to come from here. This is the thing that's going to eventually end up in the x. And if you're confused, you should do some substitution and see for yourself. Yes?
AUDIENCE: Will you please show the definition for average damp without using lambda notation in both cases.
PROFESSOR: I can't make a very simple one like that. Let me do it for you, though. I can get rid of this lambda easily. I don't want to be-- actually, I'm lying to you. I don't want to do what you want because I think it's more confusing than you think. I'm not going to write what you want.
So we'll have to get a name. FOO of x to be of F of x and x and return as a value FOO. This is equivalent, but I've had to make an arbitrary name up. This is equivalent to this without any lambdas. Lambda is very convenient for naming anonymous procedures. It's the anonymous name of something. Now, if you really want to know a cute way of doing this, we'll talk about it later. We're going to have to define the anonymous procedure. Any other questions? And so we go for our break again.
So now we've seen how to use high-order procedures, they're called. That's procedures that take procedural arguments and produce procedural values to help us clarify and abstract some otherwise complicated processes. I suppose what I'd like to do now is have a bit of fun with that and sort of a little practice as well. So let's play with this square root thing even more. Let's elaborate it and understand what's going on and make use of this kind of programming style.
One thing that you might know is that there is a general method called Newton's method the purpose of which is to find the roots-- that's the zeroes-- of functions. So, for example, to find a y such that f of y equals 0, we start with some guess. This is Newton's method. And the guess we start with we'll call y0, and then we will iterate the following expression.
y n plus 1-- this is a difference equation-- is yn minus f of yn over the derivative with respect to y of f evaluated at y equal yn. Very strange notation. I must say ugh. The derivative of f with respect to y is a function. I'm having a little bit of unhappiness with that, but that's all right. It turns out in the programming language world, the notation is much clearer.
Now, what is this? People call it Newton's method. It's a method for finding the roots of the function f. And it, of course, sometimes converges, and when it does, it does so very fast. And sometimes, it doesn't converge, and, oh well, we have to do something else. But let's talk about square root by Newton's method.
Well, that's rather interesting. Let's do exactly the same thing we did last time: a bit of wishful thinking. We will apply Newton's method, assuming we knew how to do it. You don't know how to do it yet. Well, let's go. What do I have here? The square root of x. It's Newton's method applied to a procedure which will represent that function of y, which computes that function of y. Well, that procedure is that procedure of y, which is the difference between x and the square of y. Indeed, if I had a value of y for which this was zero, then y would be the square root of x. See that? OK, I'm going to start this out searching at 1. Again, completely arbitrary property of square roots that I can do that.
Now, how am I going to compute Newton's method? Well, this is the method. I have it right here. In fact, what I'm doing is looking for a fixed point of some procedure. This procedure involves some complicated expressions in terms of other complicated things. Well, I'm trying to find the fixed point of this. I want to find the values of y, which if I put y in here, I get the same value out here up to some degree of accuracy. Well, I already have a fixed point process around to do that. And so, let's just define Newton's method over here.
A procedure which computes a function and a guess, initial guess. Now, I'm going to have to do something here. I'm going to need the derivative of the function. I'm going to need a procedure which computes the derivative of the function computed by the given a procedure f. I'm trying to be very careful about what I'm saying. I don't want to mix up the word procedure and function. Function is a mathematical word. It says I'm mapping from values to other values, a set of ordered pairs. But sometimes, I'll accidentally mix those up. Procedures compute functions.
So I'm going to define the derivative of f to be by wishful thinking again. I don't know how I'm going to do it. Let's worry about that later-- of F. So if F is a procedure, which happens to be this one over here for a square root, then DF will be the derivative of it, which is also the derivative of the function computed by that procedure. DF will be a procedure that computes the derivative of the function computed by the procedure F. And then given that, I will just go looking for a fixed point.
What is the fixed point I'm looking for? It's the one for that procedure of one argument x, which I compute by subtracting x. That's the old-- that's the yn here. The quotient of f of x and df of x, starting out with the original guess. That's all very simple.
Now, I have one part left that I haven't written, and I want you to see the process by which I write these things, because this is really true. I start out with some mathematical idea, perhaps. By wishful thinking, I assume that by some magic I can do something that I have a name for. I'm not going to worry about how I do it yet. Then I go walking down here and say, well, by some magic, I'm somehow going to figure how to do that, but I'm going to write my program anyway. Wishful thinking, essential to good engineering, and certainly essential to a good computer science.
So anyway, how many of you wished that your computer ran faster? Well, the derivative isn't so bad either. Sort of like average damping. The derivative is a procedure that takes a procedure that computes a function as its argument, and it produces a procedure that computes a function, which needs one argument x. Well, you all know this definition. It's f of x plus delta x minus f of x over delta x, right? For some small delta x. So that's the quotient of the difference of f of the sum of x and dx minus f point x divided by dx. I think the thing was lining up correctly when I balanced the parentheses.
Now, I want you to look at this. Just look. I suppose I haven't told you what dx is. Somewhere in the world I'm going to have to write down something like that. I'm not interested. This is a procedure which takes a procedure and produces an approximation, a procedure that computes an approximation of the derivative of the function computed by the procedure given by the standard methods that you all know and love.
Now, it may not be the case that doing this operation is such a good way of approximating a derivative. Numerical analysts here should jump on me and say don't do that. Computing derivatives produces noisy answers, which is true. However, this again is for the sake of understanding. Look what we've got. We started out with what is apparently a mathematically complex thing. and. In a few blackboards full, we managed to decompose the problem of computing square roots by the way you were taught in your college calculus class-- Newton's method-- so that it can be understood. It's clear.
Let's look at the structure of what it is we've got. Let's look at this slide. This is a diagram of the machine described by the program on the blackboard. There's a machine described here. And what have I got? Over here is the Newton's method function f that we have on the left-most blackboard. It's the thing that takes an argument called y and puts out the difference between x and the square of y, where x is some sort of free variable that comes in from the outside by some magic. So the square root routine picks up an x, and builds this procedure, which I have the x rolled up in it by substitution.
Now, this procedure in the cloud is fed in as the f into the Newton's method which is here, this box. The f is fanned out. Part of it goes into something else, and the other part of it goes through a derivative process into something else to produce a procedure, which computes the function which is the iteration function of Newton's method when we use the fixed point method. So this procedure, which contains it by substitution-- remember, Newton's method over here, Newton's method builds this procedure, and Newton's method has in it defined f and df, so those are captured over here: f and df. Starting with this procedure, I can now feed this to the fixed point process within an initial guess coming out from the outside from square root to produce the square root of x. So what we've built is a very powerful engine, which allows us to make nice things like this.
Now, I want to end this with basically an idea of Chris Strachey, one of the grandfathers of computer science. He's a logician who lived in the-- I suppose about 10 years ago or 15 years ago, he died. I don't remember exactly when. He's one of the inventors of something called denotational semantics. He was a great advocate of making procedures or functions first-class citizens in a programming language.
So here's the rights and privileges of first-class citizens in a programming language. It allows you to make any abstraction you like if you have functions as first-class citizens. The first-class citizens must be able to be named by variables. And you're seeing me doing that all the time. Here's a nice variable which names a procedure which computes something. They have to be passed as arguments to procedures. We've certainly seen that. We have to be able to return them as values from procedures. And I suppose we've seen that. We haven't yet seen anything about data structures. We will soon, but it's also the case that in order to have a first-class citizen in a programming language, the object has to be allowed to be part of a data structure. We're going to see that soon.
So I just want to close with this and say having things like procedures as first-class data structures, first-class data, allows one to make powerful abstractions, which encode general methods like Newton's method in very clear way. Are there any questions? Yes.
AUDIENCE: Could you put derivative instead of df directly in the fixed point?
PROFESSOR: Oh, sure. Yes, I could have put deriv of f right here, no question. Any time you see something defined, you can put the thing that the definition is there because you get the same result. In fact, what that would look like, it's interesting.
AUDIENCE: Lambda.
PROFESSOR: Huh?
AUDIENCE: You could put the lambda expression in there.
PROFESSOR: I could also put derivative of f here. It would look interesting because of the open paren, open paren, deriv of f, closed paren on an x. Now, that would have the bad property of computing the derivative many times, because every time I would run this procedure, I would compute the derivative again. However, the two open parens here both would be meaningful. I want you to understand syntactically that that's a sensible thing. Because if was to rewrite this program-- and I should do it right here just so you see because that's a good question-- of F and guess to be fixed point of that procedure of one argument x, which subtracts from x the quotient of F applied to x and the deriv of F applied to x. This is guess.
This is a perfectly legitimate program, because what I have here-- remember the evaluation rule. The evaluation rule is evaluate all of the parts of the combination: the operator and the operands. This is the operator of this combination. Evaluating this operator will, of course, produce the derivative of F.
AUDIENCE: To get it one step further, you could put the lambda expression there, too.
PROFESSOR: Oh, of course. Any time I take something which is define, I can put the thing it's defined to be in the place where the thing defined is. I can't remember which is definiens and which is definiendum. When I'm trying to figure out how to do a lecture about this in a freshman class, I use such words and tell everybody it's fun to tell their friends. OK, I think that's it.