Flash and JavaScript are required for this feature.
Download the video from Internet Archive.
Topics covered: Streams, Part 1
Instructors: Hal Abelson and Gerald Jay Sussman
Subtitles for this course are provided through the generous assistance of Henry Baker, Hoofar Pourzand, Heather Wood, Aleksejs Truhans, Steven Edwards, George Menhorn, and Mahendra Kumar.
6A: Streams, Part 1
PROFESSOR: Well, last time Gerry really let the cat out of the bag. He introduced the idea of assignment. Assignment and state. And as we started to see, the implications of introducing assignment and state into the language are absolutely frightening. First of all, the substitution model of evaluation breaks down. And we have to use this much more complicated environment model and this very mechanistic thing with diagrams, even to say what statements in the programming language mean.
And that's not a mere technical point. See, it's not that we had this particular substitution model and, well, it doesn't quite work, so we have to do something else. It's that nothing like the substitution model can work. Because suddenly, a variable is not just something that stands for a value. A variable now has to somehow specify a place that holds a value. And the value that's in that place can change.
Or for instance, an expression like f of x might have a side effect in it. So if we say f of x and it has some value, and then later we say f of x again, we might get a different value depending on the order. So suddenly, we have to think not only about values but about time.
And then things like pairs are no longer just their CARs and their CDRs. A pair now is not quite its CAR and its CDR. It's rather its identity. So a pair has identity. It's an object. And two pairs that have the same CAR and CDR might be the same or different, because suddenly we have to worry about sharing.
So all of these things enter as soon as we introduce assignment. See, this is a really far cry from where we started with substitution. It's a technically harder way of looking at things because we have to think more mechanistically about our programming language. We can't just think about it as mathematics. It's philosophically harder, because suddenly there are all these funny issues about what does it mean that something changes or that two things are the same. And also, it's programming harder, because as Gerry showed last time, there are all these bugs having to do with bad sequencing and aliasing that just don't exist in a language where we don't worry about objects.
Well, how'd we get into this mess? Remember what we did, the reason we got into this is because we were looking to build modular systems. We wanted to build systems that fall apart into chunks that seem natural. So for instance, we want to take a random number generator and package up the state of that random number generator inside of it so that we can separate the idea of picking random numbers from the general Monte Carlo strategy of estimating something and separate that from the particular way that you work with random numbers in that formula developed by Cesaro for pi.
And similarly, when we go off and construct some models of things, if we go off and model a system that we see in the real world, we'd like our program to break into natural pieces, pieces that mirror the parts of the system that we see in the real world. So for example, if we look at a digital circuit, we say, gee, there's a circuit and it has a piece and it has another piece. And these different pieces sort of have identity. They have state. And the state sits on these wires. And we think of this piece as an object that's different from that as an object. And when we watch the system change, we think about a signal coming in here and changing a state that might be here and going here and interacting with a state that might be stored there, and so on and so on.
So what we'd like is we'd like to build in the computer systems that fall into pieces that mirror our view of reality, of the way that the actual systems we're modeling seem to fall into pieces. Well, maybe the reason that building systems like this seems to introduce such technical complications has nothing to do with computers.
See, maybe the real reason that we pay such a price to write programs that mirror our view of reality is that we have the wrong view of reality. See, maybe time is just an illusion, and nothing ever changes. See, for example, if I take this chalk, and we say, gee, this is an object and it has a state. At each moment it has a position and a velocity. And if we do something, that state can change.
But if you studied any relativity, for instance, you know that you don't think of the path of that chalk as something that goes on instant by instant. It's more insightful to think of that whole chalk's existence as a path in space-time. that's all splayed out. There aren't individual positions and velocities. There's just its unchanging existence in space-time.
Similarly, if we look at this electrical system, if we imagine this electrical system is implementing some sort of signal processing system, the signal processing engineer who put that thing together doesn't think of it as, well, at each instance there's a voltage coming in. And that translates into something. And that affects the state over here, which changes the state over here. Nobody putting together a signal processing system thinks about it like that.
Instead, you say there's this signal that's splayed out over time. And if this is acting as a filter, this whole thing transforms this whole thing for some sort of other output. You don't think of it as what's happening instant by instant as the state of these things. And somehow you think of this box as a whole thing, not as little pieces sending messages of state to each other at particular instants.
Well, today we're going to look at another way to decompose systems that's more like the signal processing engineer's view of the world than it is like thinking about objects that communicate sending messages. That's called stream processing. And we're going to start by showing how we can make our programs more uniform and see a lot more commonality if we throw out of these programs what you might say is an inordinate concern with worrying about time.
Let me start by comparing two procedures. The first one does this. We imagine that there's a tree. Say there's a tree of integers. It's a binary tree. So it looks like this. And there's integers in each of the nodes. And what we would like to compute is for each odd number sitting here, we'd like to find the square and then sum up all those squares.
Well, that should be a familiar kind of thing. There's a recursive strategy for doing it. We look at each leaf, and either it's going to contribute the square of the number if it's odd or 0 if it's even. And then recursively, we can say at each tree, the sum of all of them is the sum coming from the right branch and the left branch, and recursively down through the nodes. And that's a familiar way of thinking about programming.
Let's actually look at that on the slide. We say to sum the odd squares in a tree, well, there's a test. Either it's a leaf node, and we're going to check to see if it's an integer, and then either it's odd, in which we take the square, or else it's 0. And then the sum of the whole thing is the sum coming from the left branch and the right branch.
OK, well, let me contrast that with a second problem. Suppose I give you an integer n, and then some function to compute of the first of each integer in 1 through n. And then I want to collect together in a list all those function values that satisfy some property. That's a general kind of thing. Let's say to be specific, let's imagine that for each integer, k, we're going to compute the k Fibonacci number. And then we'll see which of those are odd and assemble those into a list.
So here's a procedure that does that. Find the odd Fibonacci numbers among the first n. And here is a standard loop the way we've been writing it. This is a recursion. It's a loop on k, and says if k is bigger than n, it's the empty list. Otherwise we compute the k-th Fibonacci number, call that f. If it's odd, we CONS it on to the list starting with the next one. And otherwise, we just take the next one. And this is the standard way we've been writing iterative loops. And we start off calling that loop with 1.
OK, so there are two procedures. Those procedures look very different. They have very different structures. Yet from a certain point of view, those procedures are really doing very much the same thing. So if I was talking like a signal processing engineer, what I might say is that the first procedure enumerates the leaves of a tree. And then we can think of a signal coming out of that, which is all the leaves.
We'll filter them to see which ones are odd, put them through some kind of filter. We'll then put them through a kind of transducer. And for each one of those things, we'll take the square. And then we'll accumulate all of those. We'll accumulate them by sticking them together with addition starting from 0. That's the first program.
The second program, I can describe in a very, very similar way. I'll say, we'll enumerate the numbers on this interval, for the interval 1 through n. We'll, for each one, compute the Fibonacci number, put them through a transducer. We'll then take the result of that, and we'll filter it for oddness. And then we'll take those and put them into an accumulator. This time we'll build up a list, so we'll accumulate with CONS starting from the empty list.
So this way of looking at the program makes the two seem very, very similar. The problem is that that commonality is completely obscured when we look at the procedures we wrote. Let's go back and look at some odd squares again, and say things like, where's the enumerator? Where's the enumerator in this program? Well, it's not in one place. It's a little bit in this leaf-node test, which is going to stop. It's a little bit in the recursive structure of the thing itself.
Where's the accumulator? The accumulator isn't in one place either. It's partly in this 0 and partly in this plus. It's not there as a thing that we can look at. Similarly, if we look at odd Fibs, that's also, in some sense, an enumerator and an accumulator, but it looks very different. Because partly, the enumerator is here in this greater than sign in the test. And partly it's in this whole recursive structure in the loop, and the way that we call it. And then similarly, that's also mixed up in there with the accumulator, which is partly over there and partly over there.
So these very, very natural pieces, these very natural boxes here don't appear in our programs. Because they're kind of mixed up. The programs don't chop things up in the right way. Going back to this fundamental principle of computer science that in order to control something, you need the name of it, we don't really have control over thinking about things this way because we don't have our hands in them explicitly. We don't have a good language for talking about them.
Well, let's invent an appropriate language in which we can build these pieces. The key to the language is these guys, is what is these things I called signals? What are these things that are flying on the arrows between the boxes? Well, those things are going to be data structures called streams. That's going to be the key to inventing this language.
What's a stream? Well, a stream is, like anything else, a data abstraction. So I should tell you what its selectors and constructors are. For a stream, we're going to have one constructor that's called CONS-stream. CONS-stream is going to put two things together to form a thing called a stream. And then to extract things from the stream, we're going to have a selector called the head of the stream.
So if I have a stream, I can take its head or I can take its tail. And remember, I have to tell you George's contract here to tell you what the axioms are that relate these. And it's going to be for any x and y, if I form the CONS-stream and take the head, the head of CONS-stream of x and y is going to be x and the tail of CONS-stream of x and y is going to be y. So those are the constructor, two selectors for streams, and an axiom.
There's something fishy here. So you might notice that these are exactly the axioms for CONS, CAR, and CDR. If instead of writing CONS-stream I wrote CONS and I said head was the CAR and tail was the CDR, those are exactly the axioms for pairs. And in fact, there's another thing here. We're going to have a thing called the-empty-stream, which is like the-empty-list.
So why am I introducing this terminology? Why don't I just keep talking about pairs and lists? Well, we'll see. For now, if you like, why don't you just pretend that streams really are just a terminology for lists. And we'll see in a little while why we want to keep this extra abstraction layer and not just call them lists.
OK, now that we have streams, we can start constructing the pieces of the language to operate on streams. And there are a whole bunch of very useful things that we could start making. For instance, we'll make our map box to take a stream, s, and a procedure, and to generate a new stream which has as its elements the procedure applied to all the successive elements of s. In fact, we've seen this before. This is the procedure map that we did with lists. And you see it's exactly map, except we're testing for empty-stream.
Oh, I forgot to mention that. Empty-stream is like the null test. So if it's empty, we generate the empty stream. Otherwise, we form a new stream whose first element is the procedure applied to the head of the stream, and whose rest is gotten by mapping along with the procedure down the tail of the stream. So that looks exactly like the map procedure we looked at before.
Here's another useful thing. Filter, this is our filter box. We're going to have a predicate and a stream. We're going to make a new stream that consists of all the elements of the original one that satisfy the predicate. That's case analysis. When there's nothing in the stream, we return the empty stream. We test the predicate on the head of the stream. And if it's true, we add the head of the stream onto the result of filtering the tail of the stream. And otherwise, if that predicate was false, we just filter the tail of the stream. Right, so there's filter.
Let me run through a couple more rather quickly. They're all in the book and you can look at them. Let me just flash through. Here's accumulate. Accumulate takes a way of combining things and an initial value in a stream and sticks them all together. If the stream's empty, it's just the initial value. Otherwise, we combine the head of the stream with the result of accumulating the tail of the stream starting from the initial value. So that's what I'd use to add up everything in the stream. I'd accumulate with plus.
How would I enumerate the leaves of a tree? Well, if the tree is just a leaf itself, I make something which only has that node in it. Otherwise, I append together the stuff of enumerating the left branch and the right branch. And then append here is like the ordinary append on lists. You can look at that. That's analogous to the ordinary procedure for appending two lists. How would I enumerate an interval? This will take two integers, low and high, and generate a stream of the integers going from low to high. And we can make a whole bunch of pieces.
So that's a little language of talking about streams. Once we have streams, we can build things for manipulating them. Again, we're making a language. And now we can start expressing things in this language. Here's our original procedure for summing the odd squares in a tree.
And you'll notice it looks exactly now like the block diagram, like the signal processing block diagram. So to sum the odd squares in a tree, we enumerate the leaves of the tree. We filter that for oddness. We map that for squareness. And we accumulate the result of that using addition, starting from 0. So we can see the pieces that we wanted.
Similarly, the Fibonacci one, how do we get the odd Fibs? Well, we enumerate the interval from 1 to n, we map along that, computing the Fibonacci of each one. We filter the result of those for oddness. And we accumulate all of that stuff using CONS starting from the empty-list.
OK, what's the advantage of this? Well, for one thing, we now have pieces that we can start mixing and matching. So for instance, if I wanted to change this, if I wanted to compute the squares of the integers and then filter them, all I need to do is pick up a standard piece like this in that square and put it in. Or if we wanted to do this whole Fibonacci computation on the leaves of a tree rather than a sequence, all I need to do is replace this enumerator with that one.
See, the advantage of this stream processing is that we're establishing-- this is one of the big themes of the course-- we're establishing conventional interfaces that allow us to glue things together. Things like map and filter are a standard set of components that we can start using for pasting together programs in all sorts of ways. It allows us to see the commonality of programs.
I just ought to mention, I've only showed you two procedures. But let me emphasize that this way of putting things together with maps, filters, and accumulators is very, very general. It's the generate and test paradigm for programs. And as an example of that, Richard Waters, who was at MIT when he was a graduate student, as part of his thesis research went and analyzed a large chunk of the IBM scientific subroutine library, and discovered that about 60% of the programs in it could be expressed exactly in terms using no more than what we've put here-- map, filter, and accumulate. All right, let's take a break. Questions?
AUDIENCE: It seems like the essence of this whole thing is just that you have a very uniform, simple data structure to work with, the stream.
PROFESSOR: Right. The essence is that you, again, it's this sense of conventional interfaces. So you can start putting a lot of things together. And the stream is as you say, the uniform data structure that supports that. This is very much like APL, by the way. APL is very much the same idea, except in APL, instead of this stream, you have arrays and vectors. And a lot of the power of APL is exactly the same reason of the power of this. OK, thank you. Let's take a break.
All right. We've been looking at ways of organizing computations using streams. What I want to do now is just show you two somewhat more complicated examples of that. Let's start by thinking about the following kind of utility procedure that will come in useful. Suppose I've got a stream. And the elements of this stream are themselves streams. So the first thing might be 1, 2, 3.
So I've got a stream. And each element of the stream is itself a stream. And what I'd like to do is build a stream that collects together all of the elements, pulls all of the elements out of these sub-streams and strings them all together in one thing. So just to show you the use of this language, how easy it is, call that flatten. And I can define to flatten this stream of streams. Well, what is that? That's just an accumulation. I want to accumulate using append, by successively appending. So I accumulate using append streams, starting with the-empty-stream down that stream of streams.
OK, so there's an example of how you can start using these higher order things to do some interesting operations. In fact, there's another useful thing that I want to do. I want to define a procedure called flat-map, flat map of some function and a stream. And what this is going to do is f will be a stream of elements. f is going to be a function that for each element in the stream produces another stream.
And what I want to do is take all of the elements and all of those streams and combine them together. So that's just going to be the flatten of map f down s. Each time I apply f to an element of s, I get a stream. If I map it all the way down, I get a stream of streams, and I'll flatten that.
Well, I want to use that to show you a new way to do a familiar kind of problem. The problem's going to be like a lot of problems you've seen, although maybe not this particular one. I'm going to give you an integer, n. And the problem is going to be find all pairs and integers i and j, between 0 and i, with j less than i, up to n, such that i plus j is prime.
So for example, if n equals 6, let's make a little table here, i and j and i plus j. So for, say, i equals 2 and j equals 1, I'd get 3. And for i equals 3, I could have j equals 2, and that would be 5. And 4 and 1 would be 5 and so on, up until i goes to 6. And what I'd like to return is to produce a stream of all the triples like this, let's say i, j, and i plus j. So for each n, I want to generate this stream.
OK, well, that's easy. Let's build it up. We start like this. We're going to say for each i, we're going to generate a stream. For each i in the interval 1 through n, we're going to generate a stream. What's that stream going to be? We're going to start by generating all the pairs. So for each i, we're going to generate, for each j in the interval 1 to i minus 1, we'll generate the pair, or the list with two elements i and j.
So we map along the interval, generating the pairs. And for each i, that generates a stream of pairs. And we flatmap it. Now we have all the pairs i and j, such that i is less than j. So that builds that.
Now we're got to test them. Well, we take that thing we just built, the flatmap, and we filter it to see whether the i-- see, we had an i and a j. i was the first thing in the list, j was the second thing in the list. So we have a predicate which says in that list of two elements is the sum of the CAR and the CDR prime. And we filter that collection of pairs we just built. So those are the pairs we want.
Now we go ahead and we take the result of that filter and we map along it, generating the list i and j and i plus j. And that's our procedure prime-sum-pairs. And then just to flash it up, here's the whole procedure. A map, a filter, a flatmap. There's the whole thing, even though this isn't particularly readable. It's just expanding that flatmap.
So there's an example which illustrates the general point that nested loops in this procedure start looking like compositions of flatmaps of flatmaps of flatmaps of maps and things. So not only can we enumerate individual things, but by using flatmaps, we can do what would correspond to nested loops in most other languages.
Of course, it's pretty awful to keep writing these flatmaps of flatmaps of flatmaps. Prime-sum-pairs you saw looked fairly complicated, even though the individual pieces were easy. So what you can do, if you like, is introduced some syntactic sugar that's called collect. And collect is just an abbreviation for that nest of flatmaps and filters arranged in that particular way. Here's prime-sum-pairs again, written using collect. It says to find all those pairs, I'm going to collect together a result, which is the list i, j, and i plus j, that's going to be generated as i runs through the interval from 1 to n and as j runs through the interval from 1 to i minus 1, such that i plus j is prime.
So I'm not going to say what collect does in general. You can look at that by looking at it in the book. But pretty much, you can see that the pieces of this are the pieces of that original procedure I wrote. And this collect is just some syntactic sugar for automatically generating that nest of flatmaps and flatmaps.
OK, well, let me do one more example that shows you the same kind of thing. Here's a very famous problem that's used to illustrate a lot of so-called backtracking computer algorithms. This is the eight queens problem. This is a chess board. And the eight queens problem says, find a way to put down eight queens on a chess board so that no two are attacking each other.
And here's a particular solution to the eight queens problem. So I have to make sure to put down queens so that no two are in the same row or the same column or sit along the same diagonal. Now, there's sort of a standard way of doing that. Well, first we need to do is below the surface, at George's level. We have to find some way to represent a board, and represent positions. And we'll not worry about that.
But let's assume that there's a predicate called safe. And what safe is going to do is going to say given that I have a bunch of queens down on the chess board, is it OK to put a queen in this particular spot? So safe is going to take a row and a column. That's going to be a place where I'm going to try and put down the next queen, and the rest of positions.
And what safe will say is given that I already have queens down in these positions, is it safe to put another queen down in that row and that column? And let's not worry about that. That's George's problem. and it's not hard to write. You just have to check whether this thing contains any things on that row or that column or in that diagonal.
Now, how would you organize the program given that? And there's sort of a traditional way to organize it called backtracking. And it says, well, let's think about all the ways of putting the first queen down in the first column. There are eight ways. Well, let's say try the first column. Try column 1, row 1. These branches are going to represent the possibilities at each level.
So I'll try and put a queen down in the first column. And now given that it's in the first column, I'll try and put the next queen down in the first column. I'll try and put the first queen, the one in the first column, down in the first row. I'm sorry. And then given that, we'll put the next queen down in the first row. And that's no good.
So I'll back up to here. And I'll say, oh, can I put the first queen down in the second row? Well, that's no good. Oh, can I put it down in the third row? Well, that's good. Well, now can I put the next queen down in the first column? Well, I can't visualize this chess board anymore, but I think that's right. And I try the next one.
And at each place, I go as far down this tree as I can. And I back up. If I get down to here and find no possibilities below there, I back all the way up to here, and now start again generating this sub-tree. And I sort of walk around. And finally, if I ever manage to get all the way down, I've found a solution.
So that's a typical sort of paradigm that's used a lot in AI programming. It's called backtracking search. And it's really unnecessary. You saw me get confused when I was visualizing this thing. And you see the complication. This is a complicated thing to say.
Why is it complicated? Its because somehow this program is too inordinately concerned with time. It's too much-- I try this one, and I try this one, and I go back to the last possibility. And that's a complicated thing. If I stop worrying about time so much, then there's a much simpler way to describe this. It says, let's imagine that I have in my hands the tree down to k minus 1 levels.
See, suppose I had in my hands all possible ways to put down queens in the first k columns. Suppose I just had that. Let's not worry about how we get it. Well, then, how do I extend that? How do I find all possible ways to put down queens in the next column? It's really easy. For each of these positions I have, I think about putting down a queen in each row to make the next thing. And then for each one I put down, I filter those by the ones that are safe.
So instead of thinking about this tree as generated step by step, suppose I had it all there. And to extend it from level k minus 1 to level k, I just need to extend each thing in all possible ways and only keep the ones that are safe. And that will give me the tree to level k. And that's a recursive strategy for solving the eight queens problem.
All right, well, let's look at it. To solve the eight queens problem on a board of some specified size, we write a sub-procedure called fill-columns. Fill-columns is going to put down queens up through column k. And here's the pattern of the recursion. I'm going to call fill-columns with the size eventually.
So fill-columns says how to put down queens safely in the first k columns of this chess board with a size number of rows in it. If k is equal to 0, well, then I don't have to put anything down. So my solution is just an empty chess board. Otherwise, I'm going to do some stuff. And I'm going to use collect.
And here's the collect. I find all ways to put down queens in the first k minus 1 columns. And this was just what I set for. Imagine I have this tree down to k minus 1 levels. And then I find all ways of trying a row, that's just each of the possible rows. They're size rows, so that's enumerate interval.
And now what I do is I collect together the new row I'm going to try and column k with the rest of the queens. I adjoin a position. This is George's problem. An adjoined position is like safe. It's a thing that takes a row and a column and the rest of the positions and makes a new position collection.
So I adjoin a position of a new row and a new column to the rest of the queens, where the rest of the queens runs through all possible ways of solving the problem in k minus 1 columns. And the new row runs through all possible rows such that it was safe to put one there. And that's the whole program. There's the whole procedure.
Not only that, that doesn't just solve the eight queens problem, it gives you all solutions to the eight queens problem. When you're done, you have a stream. And the elements of that stream are all possible ways of solving that problem. Why is that simpler? Well, we threw away the whole idea that this is some process that happens in time with state. And we just said it's a whole collection of stuff. And that's why it's simpler.
We've changed our view. Remember, that's where we started today. We've changed our view of what it is we're trying to model. we stop modeling things that evolve in time and have steps and have state. And instead, we're trying to model this global thing like the whole flight of the chalk, rather than its state at each instant. Any questions?
AUDIENCE: It looks to me like backtracking would be searching for the first solution it can find, whereas this recursive search would be looking for all solutions. And it seems that if you have a large enough area to search, that the second is going to become impossible.
PROFESSOR: OK, the answer to that question is the whole rest of this lecture. It's exactly the right question. And without trying to anticipate the lecture too much, you should start being suspicious at this point, and exactly those kinds of suspicions. It's wonderful, but isn't it so terribly inefficient? That's where we're going. So I won't answer now, but I'll answer later. OK, let's take a break.
Well, by now you should be starting to get suspicious. See, I've showed your this simple, elegant way of putting programs together, very unlike these other traditional programs that sum the odd squares or compute the odd Fibonacci numbers. Very unlike these programs that mix up the enumerator and the filter and the accumulator. And by mixing it up, we don't have all of these wonderful conceptual advantages of these streams pieces, these wonderful mix and match components for putting together lots and lots of programs.
On the other hand, most of the programs you've seen look like these ugly ones. Why's that? Can it possibly be that computer scientists are so obtuse that they don't notice that if you'd merely did this thing, then you can get this great programming elegance? There's got to be a catch. And it's actually pretty easy to see what the catch is.
Let's think about the following problem. Suppose I tell you to find the second prime between 10,000 and 1 million, or if your computer's larger, say between 10,000 and 100 billion, or something. And you say, oh, that's easy. I can do that with a stream. All I do is I enumerate the interval from 10,000 to 1 million. So I get all those integers from 10,000 to 1 million. I filter them for prime-ness, so test all of them and see if they're prime. And I take the second element. That's the head of the tail.
Well, that's clearly pretty ridiculous. We'd not even have room in the machine to store the integers in the first place, much less to test them. And then I only want the second one. See, the power of this traditional programming style is exactly its weakness, that we're mixing up the enumerating and the testing and the accumulating. So we don't do it all. So the very thing that makes it conceptually ugly is the very thing that makes it efficient. It's this mixing up.
So it seems that all I've done this morning so far is just confuse you. I showed you this wonderful way that programming might work, except that it doesn't. Well, here's where the wonderful thing happens. It turns out in this game that we really can have our cake and eat it too. And what I mean by that is that we really can write stream programs exactly like the ones I wrote and arrange things so that when the machine actually runs, it's as efficient as running this traditional programming style that mixes up the generation and the test.
Well, that sounds pretty magic. The key to this is that streams are not lists. We'll see this carefully in a second, but for now, let's take a look at that slide again. The image you should have here of this signal processing system is that what's going to happen is there's this box that has the integers sitting in it. And there's this filter that's connected to it and it's tugging on them. And then there's someone who's tugging on this stuff saying what comes out of the filter.
And the image you should have is that someone says, well, what's the first prime, and tugs on this filter. And the filter tugs on the integers. And you look only at that much, and then say, oh, I really wanted the second one. What's the second prime? And that no computation gets done except when you tug on these things.
Let me try that again. This is a little device. This is a little stream machine invented by Eric Grimson who's been teaching this course at MIT. And the image is here's a stream of stuff, like a whole bunch of the integers. And here's some processing elements. And if, say, it's filter of filter of map, or something.
And if I really tried to implement that with streams as lists, what I'd say is, well, I've got this list of things, and now I do the first filter. So do all this processing. And I take this and I process and I process and I process and I process. And now I'm got this new stream. Now I take that result in my hand someplace. And I put that through the second one. And I process the whole thing. And there's this new stream. And then I take the result and I put it all the way through this one the same way.
That's what would happen to these stream programs if streams were just lists. But in fact, streams aren't lists, they're streams. And the image you should have is something a little bit more like this. I've got these gadgets connected up by this data that's flowing out of them. And here's my original source of the streams. It might be starting to generate the integers.
And now, what happens if I want a result? I tug on the end here. And this element says, gee, I need some more data. So this one comes here and tugs on that one. And it says, gee, I need some more data. And this one tugs on this thing, which might be a filter, and says, gee, I need some more data. And only as much of this thing at the end here gets generated as I tugged. And only as much of this stuff goes through the processing units as I'm pulling on the end I need. That's the image you should have of the difference between implementing what we're actually going to do and if streams were lists.
Well, how do we make this thing? I hope you have the image. The trick is how to make it. We want to arrange for a stream to be a data structure that computes itself incrementally, an on-demand data structure. And the basic idea is, again, one of the very basic ideas that we're seeing throughout the whole course. And that is that there's not a firm distinction between programs and data.
So what a stream is going to be is simultaneously this data structure that you think of, like the stream of the leaves of this tree. But at the same time, it's going to be a very clever procedure that has the method of computing in it. Well, let me try this. It's going to turn out that we don't need any more mechanism. We already have everything we need simply from the fact that we know how to handle procedures as first-class objects.
Well, let's go back to the key. The key is, remember, we had these operations. CONS-stream and head and tail. When I started, I said you can think about this as CONS and think about this as CAR and think about that as CDR, but it's not. Now, let's look at what they really are.
Well, CONS-stream of x and y is going to be an abbreviation for the following thing. CONS form a pair, ordinary CONS, of x to a thing called delay of y. And before I explain that, let me go and write the rest. The head of a stream is going to be just the CAR. And the tail of a stream is going to be a thing called force the CDR of the stream.
Now let me explain this. Delay is going to be a special magic thing. What delay does is take an expression and produce a promise to compute that expression when you ask for it. It doesn't do any computation here. It just gives you a rain check. It produces a promise. And CONS-stream says I'm going to put together in a pair x and a promise to compute y.
Now, if I want the head, that's just the CAR that I put in the pair. And the key is that the tail is going to be-- force calls in that promise. Tail says, well, take that promise and now call in that promise. And then we compute that thing. That's how this is going to work. That's what CONS-stream, head, and tail really are.
Now, let's see how this works. And we'll go through this fairly carefully. We're going to see how this works in this example of computing the second prime between 10,000 and a million. OK, so we start off and we have this expression. The second prime-- the head of the tail of the result of filtering for primality the integers between 10,000 and 1 million.
Now, what is that? What that is, that interval between 10,000 and 1 million, well, if you trace through enumerate interval, there builds a CONS-stream. And the CONS-stream is the CONS of 10,000 to a promise to compute the integers between 10,001 and 1 million.
So that's what this expression is. Here I'm using the substitution model. And we can use the substitution model because we don't have side effects and state. So I have CONS of 10,000 to a promise to compute the rest of the integers. So only one integer, so far, got enumerated.
Well, I'm going to filter that thing for primality. Again, you go back and look at the filter code. What the filter will first do is test the head. So in this case, the filter will test 10,000 and say, oh, 10,000's not prime. Therefore, what I have to do recursively is filter the tail. And what's the tail of it, well, that's the tail of this pair with a promise in it.
Tail now comes in and says, well, I'm going to force that. I'm going to force that promise, which means now I'm going to compute the integers between 10,001 and 1 million. OK, so this filter now is looking at that. That enumerate itself, well, now we're back in the original enumerate situation. The enumerate is the CONS of the first thing, 10,001, onto a promise to compute the rest.
So now the primality filter is going to go look at 10,001. It's going to decide if it likes that or not. It turns out 10,001 isn't prime. So it'll force it again and again and again. And finally, I think the first prime it hits is 10,009. And at that point, it'll stop. And that will be the first prime, and then eventually, it'll need the second prime. So at that point, it will go again.
So you see what happens is that no more gets generated than you actually need. That enumerator is not going to generate any more integers than the filter asks it for as it's pulling in things to check for primality. And the filter is not going to generate any more stuff than you ask it for, which is the head of the tail. You see, what's happened is we've put that mixing of generation and test into what actually happens in the computer, even though that's not apparently what's happening from looking at our programs.
OK, well, that seemed easy. All of this mechanism got put into this magic delay. So you're saying, gee, that must be where the magic is. But see there's no magic there either. You know what delay is. Delay on some expression is just an abbreviation for-- well, what's a promise to compute an expression? Lambda of nil, procedure of no arguments, which is that expression. That's what a procedure is. It says I'm going to compute an expression.
What's force? How do I take up a promise? Well, force of some procedure, a promise, is just run it. Done. So there's no magic there at all.
Well, what have we done? We said the old style, traditional style of programming is more efficient. And the stream thing is more perspicuous. And we managed to make the stream procedures run like the other procedures by using delay. And the thing that delay did for us was to de-couple the apparent order of events in our programs from the actual order of events that happened in the machine. That's really what delay is doing.
That's exactly the whole point. We've given up the idea that our procedures, as they run, or as we look at them, mirror some clear notion of time. And by giving that up, we give delay the freedom to arrange the order of events in the computation the way it likes. That's the whole idea. We de-couple the apparent order of events in our programs from the actual order of events in the computer.
OK, well there's one more detail. It's just a technical detail, but it's actually an important one. As you run through these recursive programs unwinding, you'll see a lot of things that look like tail of the tail of the tail. That's the kind of thing that would happen as I go CONSing down a stream all the way. And if each time I'm doing that, each time to compute a tail, I evaluate a procedure which then has to go re-compute its tail, and re-compute its tail and recompute its tail each time, you can see that's very inefficient compared to just having a list where the elements are all there, and I don't have to re-compute each tail every time I get the next tail.
So there's one little hack to slightly change what delay is, and make it a thing which is-- I'll write it this way. The actual implementation, delay is an abbreviation for this thing, memo-proc of a procedure. Memo-proc is a special thing that transforms a procedure. What it does is it takes a procedure of no arguments and it transforms it into a procedure that'll only have to do its computation once.
And what I mean by that is, you give it a procedure. The result of memo-proc will be a new procedure, which the first time you call it, will run the original procedure, remember what result it got, and then from ever on after, when you call it, it just won't have to do the computation. It will have cached that result someplace.
And here's an implementation of memo-proc. Once you have the idea, it's easy to implement. Memo-proc is this little thing that has two little flags in there. It says, have I already been run? And initially it says, no, I haven't already been run. And what was the result I got the last time I was run?
So memo-proc takes a procedure called proc, and it returns a new procedure of no arguments. Proc is supposed to be a procedure of no arguments. And it says, oh, if I'm not already run, then I'm going to do a sequence of things. I'm going to compute proc, I'm going to save that. I'm going to stash that in the variable result. I'm going to make a note to myself that I've already been run, and then I'll return the result.
So that's if you compute it if it's not already run. If you call it and it's already been run, it just returns the result. So that's a little clever hack called memoization. And in this case, it short circuits having to re-compute the tail of the tail of the tail of the tail of the tail. So there isn't even that kind of inefficiency. And in fact, the streams will run with pretty much the same efficiency as the other programs precisely.
And remember, again, the whole idea of this is that we've used the fact that there's no really good dividing line between procedures and data. We've written data structures that, in fact, are sort of like procedures. And what that's allowed us to do is take an example of a common control structure, in this place iteration. And we've built a data structure which, since itself is a procedure, kind of has this iteration control structure in it. And that's really what streams are. OK, questions?
AUDIENCE: Your description of tail-tail-tail, if I understand it correctly, force is actually execution of a procedure, if it's done without this memo-proc thing. And you implied that memo-proc gets around that problem. Doesn't it only get around it if tail-tail-tail is always executing exactly the same--
PROFESSOR: Oh, that's-- sure.
AUDIENCE: I guess I missed that point.
PROFESSOR: Oh, sure. I mean the point is-- yeah. I mean I have to do a computation to get the answer. But the point is, once I've found the tail of the stream, to get the tail of the tail, I shouldn't have had to re-compute the first tail. See, and if I didn't use memo-proc, that re-computation would have been done.
AUDIENCE: I understand now.
AUDIENCE: In one of your examples, you mentioned that we were able to use the substitution model because there are no side effects. What if we had a single processing unit-- if we had a side effect, if we had a state? Could we still practically build the stream model?
PROFESSOR: Maybe. That's a hard question. I'm going to talk a little bit later about the places where substitution and side effects don't really mix very well. But in general, I think the answer is unless you're very careful, any amount of side effect is going to mess up everything.
AUDIENCE: Sorry, I didn't quite understand the memo-proc operation. When do you execute the lambda? In other words, when memo-proc is executed, just this lambda expression is being generated. But it's not clear to me when it's executed.
PROFESSOR: Right. What memo-proc does-- remember, the thing that's going into memo-proc, the thing proc, is a procedure of no arguments. And someday, you're going to call it. Memo-proc translates that procedure into another procedure of no arguments, which someday you're going to call. That's that lambda.
So here, where I initially built as my tail of the stream, say, this procedure of no arguments, which someday I'll call. Instead, I'm going to have the tail of the stream be memo-proc of it, which someday I'll call. So that lambda of nil, that gets called when you call the memo-proc, when you call the result of that memo-proc, which would be ordinarily when you would have called the original thing that you set it.
AUDIENCE: OK, the reason I ask is I had a feeling that when you call memo-proc, you just return this lambda.
PROFESSOR: That's right. When you call memo-proc, you return the lambda. You never evaluate the expression at all, until the first time that you would have evaluated it.
AUDIENCE: Do I understand it right that you actually have to build the list up, but the elements of the list don't get evaluated? The expressions don't get evaluated? But at each stage, you actually are building a list.
PROFESSOR: That's-- I really should have said this. That's a really good point. No, it's not quite right. Because what happens is this. Let me draw this as pairs. Suppose I'm going to make a big stream, like enumerate interval, 1 through 1 billion. What that is, is a pair with a 1 and a promise. That's exactly what it is. Nothing got built up.
When I go and force this, and say, what happens? Well, this thing is now also recursively a CONS. So that this promise now is the next thing, which is a 2 and a promise to do more. And so on and so on and so on.
So nothing gets built up until you walk down the stream. Because what's sitting here is not the list, but a promise to generate the list. And by promise, technically I mean procedure. So it doesn't get built up. Yeah, I should have said that before this point. OK. Thank you. Let's take a break.