Hash tables

A few words about hash tables

Let us first look at a simple, very contrived, example. Suppose you have a telephone catalogue containing one thousand, seven-digit telephone numbers. You would like to be able to use a telephone number as an index in an array. The array may perhaps contain the name of the person corresponding to the number. When talking about hash tables one would often refer to the index as a key, and the array as a table. The name would be called a value. It should also be said that the reverse of this example may be more common. Given the name of a person you would like to find the number. But in this assignment it is more natural to think of the key as being the number.

Since the telephone numbers contain seven digits, it would be wasteful to allocate an array starting with index 0 and having a maximum index of 9999999. In Fortran (where you can have any starting index) you could make an array having indices ranging from 1000000 through 9999999. This would still be wasteful, since we only have a thousand numbers. There would be many unused elements in the array.

So what one needs is a hash function that transforms the telephone number into a suitable index into a much smaller array. Storing numbers and names this way is an example of a hast table.

A simple, but not very good, hash function, in the example, would be to use the last three digits in the number as a key. This would make for a small array, but there would probably be many collisions, two numbers give the same index. It would be nicer to have perfect hashing (i.e. no collisions), but that is not always easy to accomplish, or it may take too much computing time.

In the lab one can accomplished perfect hashing with a reasonable size of the array (there will be unused elements). So, construct a hash function based on d(c_j, c_k) and the fact that distinct distances are at least 0.01 apart. What corresponds the value in the hash table? You may also assume that

1 <= d(c_j, c_k) <= 50