Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Description: This lecture covers open addressing, which is another approach to dealing with collisions (hashing with chaining was covered in Lecture 8). Cryptographic hashing is also introduced.
Instructor: Srini Devadas
Lecture 10: Open Addressing...
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: One more exacting lecture on hashing. And a couple reminders. I don't want to start out saying unpopular things, but we do have a quiz coming up next week on Tuesday. There will not be a lecture next Tuesday, but there will be a quiz. 7:30 to 9:30 Tuesday evening. I will send announcement. There's going to be a couple rooms. Some of you will be in this room. Some of you will have to go to a different room, since this room really can't hold 180 students taking a quiz. All right?
So hashing. I'm pretty excited about this lecture, because I think as I was talking with Victor just before this, if there's one thing you want to remember about hashing and you want to go implement a hash table, it's open addressing. It's the simplest way that you can possibly implement a hash table. You can implement a hash table using an array. We've obviously talked about link lists and chaining to implement hash tables in previous lectures, but we're going to actually get rid of pointers and link lists, and implement a hash table using a single array data structure, and that's the notion of open addressing.
Now in order to get open addressing to work, there's no free lunch, right? So you have a simple implementation. It turns out that in order to make open addressing efficient, you have to be a little more careful than if you're using the hash tables with chaining. And we're going to have to make an assumption about uniform hashing. I'll say a little bit more about that. But it's a different assumption from simple uniform hashing that Eric talked about. And we'll state this uniform hashing assumption.
And we look at what the performance is of open addressing under this assumption. And this is assumption is going to give us a sense of what good hash functions are for open addressing applications or for open addressing hash tables.
And finally we'll talk about cryptographic hashing. This is not really 6006 material, but it's kind of cool material. It has a lot of applications in computer security and cryptography. And so as we'll describe the notion of a cryptographic hash, and we'll talk about a couple of real simple and pervasive applications like password storage and file corruption detectors that you can implement using cryptographic hash functions, which are quite different from the regular hash functions that we're using in hash tables. Be it chaining hash tables or open addressing hash tables. All right?
So let's get started and talk about open addressing. This is another approach to dealing with collisions. If you didn't have collisions, obviously an array would work, right? If you could somehow guarantee that there were no collisions. When you have collisions, you have to worry about the chaining and ensuring that you can still find the keys even though you had two keys that collided into the same slot.
And we don't want to use chaining. The simplest data structure that we can possibly use are arrays. Back when I was a grad student, I went through and got a PhD writing programs in C, never using any other structure than arrays, because I didn't like pointers. And so open addressing is a way that you can implement hash tables doing exactly this.
And in particular, what we're going to do is assume an array structure with items. And we're going to assume that this one item-- at most one item per slot. So m has to be greater than or equal to n, right? So this is important because we don't have link lists. We can't arbitrarily increase the storage of a slot using a chain, and have n, which is the number of elements, be greater than m, right? Which you could in the link list table with chaining.
But here you only have these area locations, these indices that you can put items into. So it's pretty much guaranteed that if you want a working open addressing hash table that m, which is the number of slots in the table, should be greater than or equal to the number of elements, all right? That's important.
Now how does this work. Well, we're going to have this notion of probing. And the notion of probing is that we're going to try to see if we can insert something into this hash table, and if you fail we're actually going to recompute a slightly different hash for the key that we're trying to insert, the key value pair that we're trying to insert. All right? So this is an iterative process, and we're going to continually probe until we find an empty slot into which we can insert this key value pair. The key should index into it.
So you do have different hashes that are going to be computed based on this probing notion for a given key. All right? And so what we need now is a hash function that's different from the standard hash functions that we've talked about so far, which specifies the order of slots to probe, which is basically to try for a key. And this is going to be true for insert, search, and delete, which are three basic operations.
And they're a little bit different, all right? Just like they were different for the chaining hash table, they're different here, but they're kind of more different here. And you'll see what I mean when we go through this. And this is not just for one slot. It's going to specify an order of slots.
And so our hash function h is going to take the universe of keys and also take what we're going to call the trial count. So if you're lucky-- well, you get lucky in your first trial. And if you're not, you hope to get lucky in your second trial, and so on and so forth.
But the hash function is going to take two arguments. It's going to take the key as an argument, and it's going to take a trial, which is an integer between 0 to n minus 1, all right? And it's going to produce-- just like the chaining hash function it's going to produce a number between 0 and m minus 1, right? Where m is the number of slots in the table. All right. So that's the story.
In order to ensure that you are using the hash table corresponding to open addressing properly, what you want is-- and this is an important property-- that h k 1, so that's a key that you're given. And this could be an arbitrary key, mind you. So arbitrary key k. And what you have in terms of the slots that are being computed is this, h k 1, h k 2, and so on and so forth to h k n minus 1.
And what you want is for this vector to be a permutation of 0 1 and so on to n minus 1. And the reason for this hopefully is clear. It's because you want to be able to use the entirety of your hash table. You don't want particular slots to go unused.
And when you get to the point where the number of elements n is pretty close to m, and maybe there's just one slot left, OK? And you want to fill up this last slot with this key k that you want to put in there, and what you want to be able to say is that for this arbitrary key k that you want to put in there that the one slot that's free-- and it could be that first slot. It could be the 17th slot. Whatever-- That eventually the sequence of probes is going to be able to allow you to insert into that slot. All right?
And we generalize this notion into the uniform hashing assumption in a few minutes, but hopefully this makes sense from a standpoint of really load balancing the table and ensuring that all slots in the table are sort of equal opportunity slots. That you're going to be able to put keys in them as long as you probe long enough that you're going to be able to get there.
Now of course the fact that you're using one particular slot for one particular key depends on the order of keys that you're inserting into this table. Again, you'll see that as we go through an example, all right?
So that's the set up. That's the open addressing notion. And that as you can see, we're just going to go through a sequence of probes and our hash function is going to tell us what the sequences is, and so we don't need any pointers or anything like that.
So let's take a look at how this might work in practice. So maybe the easiest thing to do is to run through an example, and then I'll show you some pseudocode. But let's say that I have a table here, and I'm going to concentrate on the insert operation. And I'm going to start inserting things into this table. And right here I have seven slots up there.
So let's say that I want to insert 586 into the table, and I compute h of 586 comma 1, and that gives me 1. OK? This is the first insert. So I'm going to go ahead and stick 586 in here, all right? And then I insert, for argument's sake, 133. I insert 204 out here. And these are all things because the hash table is empty. 481 out here and so on. And because the hash table is empty, my very first trial is successful, all right?
So h of 481-- I'm not going to write this all out, but h 481 1 happens to be 6 and so on. All right? Now I get to the point where I want to insert 496. And when I try to insert 496, I have h 496 1. It happens to be 4. OK?
So the first thing that happens is I go in here, and I say oops. This slot is occupied, because this-- I'm going to have a special flag associated with an empty slot, and we can say it's none. And if it's not none, then it's occupied. And 204 is not equal to none. So I look at this, and I say the first probe actually failed. OK? And so h 496 1 equals 4 fails, so I need to go do h 496 2.
And h 496 2 may also fail. You might be in a situation where h 496 2 gives you 586. So this was h 496 1 h 496 2 might give you 586. And finally it may be that h 496 3, which is your third attempt, equals 3. So you go in, and you say great. I can insert 496. And let me write that in bold here. Out there. All right? So pretty straightforward.
In this case, you've gone through three trials in order to find an empty slot. And so the big question really here is other than taking care of search and delete, how long is this process going to take? All right? And I'm talking about that in a few minutes, but let me explain, now that you've seen insert, how search would work, right? Or maybe I get one of you guys to explain to me once you have insert, how would search work? Someone? Someone from the back? No one. You guys are always answering questions. Yeah, all the way in the back.
AUDIENCE: Would you just do the same kind of probing [INAUDIBLE] where you find it or you don't find it?
PROFESSOR: Right. So you do exactly. It's very similar to insert. You have a situation where you're going to none would indicate an empty slot. And you can think of this as being a flag. And in the case of insert, what you did was you-- insert k v would say keep probing. I'm not going to write the pseudocode for it. Keep probing until an empty slot is found. And then when it's found, insert item.
And as long as you have the permutation property that we have up there, and given that m is greater than or equal to n, you're guaranteed that insert is going to find a slot. OK? That's the good news. Now it might take awhile, and so we have a talk about performance a bit later, but it'll work. OK?
Now search is a little bit different. You're searching for a key k, and you essentially say you're going to keep probing. And you say as long as the slots encountered are occupied by keys not equal to k. So every time you probe, you go in there and you say I got a key. I found a hash for it. I go to this particular slot. I look inside of it, and I check to see whether the key that's stored inside of it is the same as the key I'm searching for. If not, I go to the next trial. If it is, then I return it. Right? So that's pretty much it.
And we keep probing until you either encounter k or find an empty slot. And this is the key. No pun intended. A notion which is that when you find an empty slot, it means that you have failed to discover this key. You fail to-- yeah, question back there?
AUDIENCE: What happens if you were to delete a key though?
PROFESSOR: I'll make you answer that question for a cushion. So we'll get to delete in a minute. But I want to make sure you're all on board with insert and search. OK? So these are actually fairly straightforward in comparison to delete. It's not like delete is much more complicated, but there is a subtlety there.
And so that's kind of neat, right? I mean this actually works. So if you had a situation where you were just accumulating keys, and you're looking for the number of distinct elements in the stream of data that was coming in, and that was pretty much it with respect to your program, you'd never have to delete keys, and this would be all that you'd have to implement. Right?
But let's talk about delete. Every once in awhile we'd want to delete a key? Yeah, you had a question?
AUDIENCE: I have a question about search. Why do you stop searching once you find an empty slot?
PROFESSOR: Because you're searching. So what that means is that you're looking to see if this key were already in the table. And if key were already in the table, you want to return the value associated with that key. If you find an empty slot, since you're using the same deterministic sequence of probes that you would have if you had inserted it, then-- that make sense? Good. All right. So so far so good? That's what works for insert and search.
Let's talk delete. So back there. How does delete work?
AUDIENCE: Well [INAUDIBLE] if you search until you find the none and assume that the key you're searching for was not put in there. But let's say you had one that was in that slot before and it got put back in, but then you delete the one that was in the slot before.
PROFESSOR: Great, great. You haven't told me how to fix it yet, but do you have the guts for this? No. OK, I think this veers to the right. I always wanted to do this to somebody in the back. All right. Whoa. All right, good catch. All right. OK. So you pointed out the problem, and I'm going to ask somebody else for a solution. All right?
But here's the problem. Here's the problem, and we can look at it from a standpoint of that example right there. Let's say for argument's sake that I'm searching-- now I've done all of the inserts that I have up there, OK? So I've inserted 496. All right? Then I delete 586 from the table, OK? I delete 586 from the table. So let's just say that what I end up doing-- I have 586, 133, 496, and then I have 204, and then a 481. And this is 0, 1, 2, et cetera.
So I'm deleting 586, and let's say I replace it with none. OK? Let's just say I replace it with none. Now what happens is that when I search for 496, according to this search algorithm what am I going to get?
AUDIENCE: None.
PROFESSOR: Well the first slot I'm going to look at is 1, and according to this search algorithm, I find an empty slot, right? And when I find an empty slot, I'm going to say I failed in the search. If you encounter k, you succeed and return the key value pair, right? Success means you return the value. And if you encounter an empty slot, it means that you've decided that this key is not in the table. And you say couldn't find it, right? That make sense?
So this is obviously wrong, right? Because I just inserted 496 into the table. So this would fail incorrectly. So failed to find the key, which is OK. I mean failure is OK if the key isn't there. But you don't want to fail incorrectly. Right? Everyone buy that? Everyone buy that? Good.
All right. So how do I fix it. Someone else? How do I fix this? Someone who doesn't have a cushion. All right, you.
AUDIENCE: [INAUDIBLE] you can mark that spot by a, and when search comes across a, you just [INAUDIBLE].
PROFESSOR: Right, great answer. We're now going to have to do a couple of different things for insert and search, OK? It's going to be subtly different, but the first thing we're going to do is we're going to have this flag, and I'll just call it delete me flag. OK? And we're going to say that when I delete something, replace deleted item with not the non flag, but a different flag that we'll call delete me. Is different from none. And that's going to be important, because now that you have a different flag, and you replace 586 with delete me, you can now do different things in insert versus search, right?
So in particular, what you would do is you'd have to modify this slightly, because the notion of an empty slot means that you're looking for none, right? And all it means is that-- well actually in some sense, the pseudo code doesn't really change because if you say you either encounter k or you would-- even if you encounter a delete me, you keep going. All right? That's the important thing. So I guess it does change, because I assume that you have only two cases here, but what you really have now are three cases.
The three cases are when you're doing the search is that you encounter the key, which is the easy case. You return it. You return the value. Or you can encounter a delete me flag, in which case you keep going. OK? And if you encounter an empty slot, which corresponds to none, at that point you know you failed and the key doesn't exist in the table. All right?
So let me just write that out. Insert treats delete me the same as none. But search keeps going and treats it differently. And that's pretty much it.
So what would happen in our example? Well, going through exactly the same example, we started from here, and then we decided to delete 586. And so if we replaced 586 not with none, but with delete me, and the next time around when you search for 496, you're searching for 496. And what would happen is that you would go look at 586-- the slot that contained 586, and you see that there's a delete me flag in there. And so you go to the next trial.
And then in the next trial, you discover that, in this case, you have-- I'm sorry. I had 204 first as the first trial, and then in the second trial I had 586. And I would continue beyond the second trial and get to third trial, and in fact return 496 in this case. I would get to returning 496 in my third trial, which is exactly what I want.
The interesting thing here is that you can reuse storage. I mean the whole point of deleting is that you can take the storage and insert other keys in there. Once you've freed up the storage. And you can do that by making insert treat delete me the same as the none. So the next time you want to insert you could-- if you happen to index into the index corresponding to 586, you can override that. The delete me flag goes away, and some other key-- call it 999 or something-- would get in there. And you're all set with that. OK? Any questions? This all makes sense?
So you could imagine coding this up with an array structure is fairly straightforward. What remains here to be discussed is how well does this work, right? You have this extra requirement on the hash function corresponding to creating an extra argument as an input to it, which is this trial count. And you'd like to have this nice property of corresponding to a permutation. Can we actually design hash functions like this? And we'll take a look at a bad hash function, and then at a better one.
So let's talk about probing strategies, which is essentially the same as taking a hash function and changing it so it is actually applicable to open addressing. So the notion of linear probing is that you do h k i equals h prime k, which is some hash function that you've chosen, plus i mod m, where this is an ordinary hash function. OK? So that looks pretty straightforward.
What happens here? Does this satisfy the permutation argument? Before I forget. Does it satisfy the permutation property that I want h k 1, h k 2, h k m minus 1 to be a permutation? That make sense? Yep, yep. Because I then I start adding. The mod is precisely kind of this round robin cycle, so it's going to satisfy the permutation. That's good.
What's wrong with this? What's wrong with this? Someone?
AUDIENCE: The fact that [INAUDIBLE] keys, which they're all filled, then if you hit anywhere in here [INAUDIBLE] list of consecutive keys.
AUDIENCE: Right. That's excellent. Excellent, excellent answer. So this notion of clustering is basically what's wrong with this probing strategy. And in fact, I'm not going to do this particular analysis, but I'll give you a sense of why the statement I'm going to make is true.
But the notion of clustering is that you start getting consecutive groups of occupied slots, OK? Which keep growing. And so these clusters get longer and longer. And if you have a big cluster, it's more likely to grow bigger, right? Which is bad. This is exactly the wrong thing for load balancing, right? And clustering is the reverse of load balancing, right? If you have a bunch of clumps and you have a bunch of empty space in your table, that's bad. Right?
The problem with linear probing is that once you start getting a cluster, given the, let's say, the randomness in the hash function, and h prime k is a pretty good hash function and can randomly go anywhere. Well, if you have 100 slots and you have a cluster of size 4, well there's a for 4/100 chance, which is obviously four times greater than 1/100, even I can do that, to go into those four slots. And if you going into those four slots you're going to keep going down to the bottom, and you're going to make that a cluster of size five, right?
So that's the problem the linear probing, and you can essentially argue through making some probabilistic assumptions that if, in fact, you use linear probing that you lose your average constant time look up in your hash table for most load factors. So what's happening out here pictorially really is that you have a table and let's say you have a cluster.
And this is your cluster. So if your h k 1-- it doesn't really matter what it is-- but h k i maps to this cluster, then you're going to-- linear probing says that the next thing you're going to try is if you map to 42 in the cluster, the next thing you're going to try is 43, 44, until you get maybe to this slot here, which is 57, for argument's sake. Right?
So you're going to keep going, and you're going to try 15 times in this relatively dumb fashion to go down to get to the open slot, which is 57. And oh, by the way, at the end of this you just increased your cluster length by one. All right? So it doesn't really work.
And in fact, under reasonable probabilistic assumptions in terms of what your hash functions are, you can say that when you have alpha, which is essentially your load factor, which is n over m less than 0.99, you see clusters of size log n, OK? Right.
So this is a probabilistic argument, and you're assuming that you have a hash function that's a pretty good hash function. So h prime k can be this perfect hash function, all right? So there's a problem here beyond the choice of h prime k, which is this hash function that worked really well for chaining. All right? And the problem here is the linear probing aspect of it.
So what does that mean? If you have clusters of theta log n, then your search and your insert are not going to be constant time anymore. Right? Which is bad in a probabilistic sense. OK?
So how do we fix that? Well, one strategy that works reasonably well is called double hashing. And it literally means what it says. You have to run a couple of hashes. And so the notion of double hashing is that you have h k i equals h1 k plus i h2 k mod m. And h1 and h2 are just ordinary hash functions. OK?
Now the first thing that we need to do is figure out how we can guarantee a permutation, right? Because we still have that requirement, and it was OK for the linear probing part, but you still have this requirement that you need a permutation. And so those of you who are into number theory, can you tell me what property, what neat property of h2 and m can we ask for to guarantee a permutation? Do you have a question? You already do. Do you have a question?
AUDIENCE: [INAUDIBLE].
PROFESSOR: [INAUDIBLE] relatively prime. OK, good. So I figured some of you knew the answer, but I've seen you before. Right. Exactly right. Relatively prime. Just hand it to Victor. So h2 k and m being relatively prime, if that implies a permutation. It's similar to what we had before. You're multiplying this by i. i keeps increasing, and you're going to roll around. All right? I mean you could do a proof of it, but I'm not going to bother.
The important thing here is that you can now do something as simple as m equals 2 raised to r, and h2 k for all k is odd, and now you're in great shape. You can have your array to be 2 raised to something, which is what you really want. And you just use h2 k. You could even take a regular hash function and truncate it to make sure it's odd. You can do a bunch of things. There's hash functions that produce odd values, and you can use that. All right?
And so double hashing works fairly well in practice. It's a good way of getting open addressing to work. And in order to prove that open addressing actually works to the level at which chaining works, we have to make an assumption corresponding to uniform hashing. And I'm not going to actually do a proof, but it'll be in the notes.
But I do want to talk about the theorem and the result that the theorem implies, assuming you have the uniform hashing assumption. And let me first say that this is not the same as simple uniform happening, which talks about the independence of keys in terms of their mapping to slots. The uniform hashing assumption says that each key is equally likely to have any one of the m factorial permutations-- so we're talking about random permutations here-- as its probe sequence. All right?
This is very hard to get in practice. You can get pretty close using double hashing. But nobody's discovered a perfect hash function, deterministic hash function that satisfies this property. At least not that I know off.
So what does this imply? Assuming that you have this and double hatching gives you this property, to a large extent what this means is that if alpha is n over m, you can show that the cost of operations such as search, insert, delete, et cetera. And in particular we talk about insert is less than or equal to 1 divided by 1 minus alpha. OK?
So obviously this goes as alpha tends to 1. As alpha tends to 1, the load factor in the table gets large, and the number of expected probes that you need to do when you get an insert grows. And if alpha is 0.99, you're going, on average, require 100 probes. It's a constant number, but it's a pretty bad constant. Right?
So you really want alpha to be fairly small. And in practice it turns out that you have to re-size you're open addressing table when alpha gets beyond about 0.5, 0.6 or so, because by then you're really in trouble. Remember this is an average case we're talking about. All of this is using a probabilistic assumption.
But as you get to high alphas, suddenly by the time you get to 0.7, open addressing doesn't work well in relation to an equivalent table with the overall number of slots that correspond to a changing table, OK? So open addressing is easy to implement. It uses less memory because you don't need pointers. But you better be careful that your alpha stays around 0.5 and no more.
So all that means is you can still use it. You just have to re-size your table. You have slightly different strategies for resizing your table when you use open addressing as opposed to chaining hash tables. All right? So that's a summary of open addressing. I want to spend some time on cryptographic hashes in the time that I have left. I guess I have a few minutes left. But any questions about open addressing? Yep?
AUDIENCE: On this delete part, what's going to happen if, say, you fill the table up and then delete everything, and then you start searching. Isn't that going to be bad because it's going to search through everything?
PROFESSOR: So that's right. The bad thing about open addressing is that delete isn't instantaneous, right? In the sense that if you deleted something from the link list in your chaining table, then even if you went to that same thing, the chain got smaller, and that helps you, because your table now has lower load. But there's a delay associated with load when you have the delete me flag. OK?
So in some sense the alpha that you want to think about, you should be careful as to how you define alpha. And that's one of the reasons why when you get alpha being 0.5, 0.6 you get into trouble, because if you have all these delete me flags, they're still hurting you.
AUDIENCE: And when you resize do those delete me flags get deleted?
PROFESSOR: When you completely resize and you redo the whole thing, then you can clean up the delete me's and turn them into nones because you're rehashing it. All right. So yeah, back there. Question?
AUDIENCE: Yes, can you explain how you got the equation that the cost of operation insert is less than or equal to 1 over [INAUDIBLE].
PROFESSOR: That's a longish proof, but let me explain to you how that comes out. Basically the intuition behind the proof is that we're going to assume some probability p. And initially you're going to say something like if the table, your p-- I'll just write this out here-- is m minus n divided by m. So what is that? Right now I have n elements in the table, and I have m slots, OK?
So the probability that my very first trial is going to succeed is going to be m minus n divided by m, because these are the number of empty slots. And assuming my permutation argument, I could go into one of them. And so that's what I have here. And if you look at what this is, this is 1 minus alpha, OK?
And so then you run off and you remember 6041 or the high school probability course that you take, and you say generally speaking, you're going to be no worse than p for every trial. And so if you assume the worst and say every trial has a probability of success of p, the expected number of trials is 1/p, OK? And that's how you got the 1 over 1 minus alpha. So you'll see that written in gory detail in the notes. All right? OK.
Expected to have a little more time in terms of talking about cryptographic hashes, but cryptographic hashes are not going to be on the quiz. This is purely material FYI. For your interest only. And again I have some notes on it, but I want to give you a sense of the other kinds of hashes that exist in the world, I guess. And hashes that are used for many different applications.
So maybe the best way of motivating this is through an example. So let's talk about an example that is near and dear to every security person's heart and probably to people who aren't interested in security as well, which is password storage. So think about how, let's say, Unix systems work when you type in your password. You're typing in your password [INAUDIBLE], and this is true for other systems as well, but you have a password. And my password is a permutation of my first daughters first name.
[LAUGHTER]
Yeah, but haven't given it away, right? Haven't given it away. And so this password is something that I'm typing in every day, right? Now the sum check that needs to happen to ensure that I'm typing in the right password. So what is a dumb way of doing things. What's a dumb way of building systems?
AUDIENCE: Storing [INAUDIBLE].
PROFESSOR: This is kind of a freebie.
AUDIENCE: [INAUDIBLE].
PROFESSOR: In situ hashing. That's better. So you'd store it. I offered the dumb way. So there's a perfectly valid answer. So you could clearly store this in plain text in some file and you could call it slash etc slaw password. And you could make it read for the work, right? And that'd be great, and people do that, right?
But what you would rather do is you want to make sure that even the sysadmin doesn't know my password or your password, right? So how do you do that? Well you do that using a cryptographic hash that has this interesting property that is one way, OK? And what that means is that given h of x-- OK, this is the value of the hash-- it is very hard to find the x such that x basically hashes to this value.
So if h of x equals let's call it q, then you're only given h of x. And so what do you do now? Well, it's beautiful. Assuming you have this one way hash, this cryptographic hash, in your etc slash password file, you have something like login name, [INAUDIBLE], which happens to be the hash of my daughter's first name, or something.
But this is what's stored in there and the same thing for a bunch of different users, right? So when I log in and I type in the actual password, what does the system do? What does the system do? It hashes it. It takes x prime, which is the typed in password, which may or may not be equal to my password, because somebody else might be trying to break in, or I just mistyped, or forgot my daughter's first name, which would be bad.
And it will just check to see-- it doesn't need x, because it's stored h of x in the system, so it doesn't need x. So if we just compare against what I typed in, it would compute the hash again. And then would let me in assuming that these things matched and would not let me in if it didn't. So now we can talk about-- and I don't have time for this, but you can certainly read up on it on Wikipedia and a bunch in the notes. You can talk about what properties should this hash function have, namely one way collision resistance, in order to solve these problems and other problems. I'm happy to stick around and answer questions.
Free Downloads
Video
- iTunes U (MP4 - 110MB)
- Internet Archive (MP4 - 110MB)
Subtitle
- English - US (SRT)