CSCI 2720 Data Structures Spring 2004 Programming Project 5 - due Thursday, April 29, by midnight. See the General Project Instructions on the course web site for directions which apply to all projects, including this one. The project has two parts. The first is to implement a hash table class using templates and external chaining implemented by your Olist< > class from Project #1, after adding a suitable iterator to the list class. Your hash table class will be compared to your BST< > class from Project #2 by timing tests in the second part of the project. Part I. Templated Hash Table Class Specify and implement a hash table class Xchain based on external chaining, as described in Lewis and Denenberg pp. 266-269. Here Dtype is to be the data type for the keys and Ctype contains static functions lt and hprep. As in Project #4, lt defines the ordering for keys in a chain as follows: given inputs x, and y of type Dtype, the function Ctype::lt(x,y) returns an integer value < 0 if x should come before y in the ordering, > 0 if y should come before x in the ordering, and 0 if x and y are equal. The default (when the second class Ctype is omitted) should be that < is used for comparisons so that x comes before y in the ordering if x < y, which will only work if operator< is defined for Dtype. Given an input of type Dtype, the function Ctype::hprep(x) should return an unsigned int; this will be used for computing the hash value of x. The default (when the second class Ctype is omitted) should be a static cast of x to an unsigned int, which will only work if Dtype is some integer type. *** For chaining use class OList, obtained from your OList class *** from Project #1, modified by supplying the comparison function as *** Ctype::lt( , ) as in Project #2 (instead of through the constructor) *** and extended by the addition of public functions for iterating *** through a list. The hash table (a private member of your class) will be *** an array of objects of type OList. Use only public member *** functions of the OList class to access or modify the chains. *** In order to rehash as required below, you will need to supply the class *** OList with functions for iteration through the list. These should be public member functions with the declarations void Set_Iter( ); void Advance_Iter( ); int Valid_Iter( ) const; const Dtype& Key_Iter( ) const; and should operate so that the code mylist.Set_Iter( ); while ( mylist.Valid_Iter( ) ) { cout << mylist.Key_Iter( ) << ' '; mylist.Advance_Iter( ); } will send the key values at the nodes to cout, separated by spaces, in head to tail order. This can be accomplished by adding a curr_node pointer as a private member; the iterator functions can arrange for this to point to the current node in the iteration, or to 0 if there is none. Supply public member functions Find( ), Insert( ), and Remove( ) for the hash table class; these must each take a const Dtype& input and return an int. Each should return 1 if the operation was successful, and 0 if not (i. e., return 0 if you try to insert a key already present in the dictionary, or find or delete a key which is not present). Also supply a public constructor and a public destructor. The constructor should take a float parameter which determines the maximum load factor, max_LF, which should take 1.0 as the default value. The initial table size should be 21, and when a table's load factor exceeds max_LF then it should be rehashed to a table of size 2*max{1.0,max_LF}*old_size, rounded up to an integer, so that the table size will at least double and so that the load factor after rehashing will be at most 0.5. For the hash function which maps 30 bit integers down to table indices use the universal class of hash functions described in Lewis and Denenberg pp. 290-291. Use the prime number N = 2147483659 (the next prime number after 2^31), which should be declared as: const unsigned long long in the header file. Whenever a table is created or rehashed use rand( ) to select random parameters a and b to determine the particular hash function in the universal class. These should be random 30 bit integers, produced by rand( ) as described in Project #2, Part II. Be careful to call srand(seed) only once, in the constructor; use the last 4 digits of your Student ID number for the seed. Your specification should be in a file called xchain.h and your implementation code should be in xchain.cc. Your #include directives should be arranged so that any client code with #include "xchain.cc" will be able to use the class Xchain< >. Before going on to part II you should test your Xchain< > class using Find( ), Insert( ), and Remove( ). Make sure it works for int data with Ctype omitted (so that the the default will be used) and for C style strings. For the strings define a class Cstring to act as the second template parameter, using strcmp for the static member Cstring::lt. For the static member Cstring::hprep use hashing by division as described in Lewis and Denenberg pp. 284-285. For the prime number n use 2147483659 as above, i.e., N. For the base (radix) use 255. The numerical value of a char from the string, say ch, should be the result of casting to unsigned int and then subtracting 1, so that the range is 0 .. 254 (since the null string can't appear inside a C style string). Your definition of the class Cstring should be in xchain.cc. Part II. Timing Comparisons The aim is to create dictionaries of type Xchain and BST (unchanged from Project 2 except to fix it if necessary) of some large size L and for maximum load factors 1, 8, and 64 (for the hash tables) and measure the average time for the sequence of insertions, for successful and unsuccessful Find( )s on those hash tables, and for the same sequence of deletions which empties each dictionary. A convenient way to accomplish this (as in Project #1) which effectively randomizes order in the Xchain hash table is based on a relatively prime pair of integers range and factor. For i = 1...(range - 1) the numbers j = (i*factor) % range will will run through the same set of values 1...(range - 1) in a different order. Just choose factor to be somewhere near to (0.62)*range and share no divisor with range other than 1 and factor. For example, 1024 for range and 611 for factor. To use this scheme, start with an empty dictionary and enter 2*j into it, as the j values are generated. Then performing Find(x) for x = 2, 4, ..., 2*(range - 1) will provide range - 1 successful Find( )s whereas doing it for x = 1, 3, ..., 2*range -1 will provide range unsuccessful Find( )s. Finally, deleting 2*j for each j with the values generated in the same order as for insertion should result in an empty dictionary again. When you perform this experiment for a particular maximum load factor, choose a value for range which is as large as practical (in terms of time and space usage on the workstation) and for which the load factor (with all range - 1 elements entered) is close to the maximum. The code for your timing runs should be in timing5.cc, and the times you obtain should be tabulated in proj5_results along with a discussion of their significance. Answer the questions "How do Xchain and BST compare to each other?" and "What effect does the maximum load factor in the hash tables have on the total insertion time, the average time for successful Find( )s, the average time for unsuccessful Find( )s, and the total deletion time?". If you find it more convenient your timing code can be divided among several files, named timing5A.cc, timing5B.cc, etc.