CSCI 2720	             Data Structures			Spring 2004


       Programming Project 5 - due Thursday, April 29, by midnight.

See the General Project Instructions on the course web site for directions
which apply to all projects, including this one.


The project has two parts.
	The first is to implement a hash table class using templates and
	  external chaining implemented by your Olist< >  class from 
          Project #1, after adding a suitable iterator to the list class.
	Your hash table class will be compared to your BST< > class from
	  Project #2 by timing tests in the second part of the project.

Part I.  Templated Hash Table Class

Specify and implement a hash table class Xchain<Dtype, Ctype> based on external
  chaining, as described in Lewis and Denenberg pp. 266-269.  Here
  Dtype is to be the data type for the keys and Ctype contains static functions
  lt and hprep.  As in Project #4, lt defines the ordering for keys in a chain
  as follows: given inputs x, and y of type Dtype, the function Ctype::lt(x,y)
  returns an integer value < 0 if x should come before y in the ordering, > 0
  if y should come before x in the ordering, and 0 if x and y are equal.  The
  default (when the second class Ctype is omitted) should be that < is used for
  comparisons so that x comes before y in the ordering if x < y, which will only
  work if operator< is defined for Dtype.  Given an input of type Dtype, the
  function Ctype::hprep(x) should return an unsigned int; this will be used for
  computing the hash value of x.  The default (when the second class Ctype is
  omitted) should be a static cast of x to an unsigned int, which will only work
  if Dtype is some integer type.

*** For chaining use class OList<Dtype, Ctype>, obtained from your OList class
  *** from Project #1, modified by supplying the comparison function as 
  *** Ctype::lt( , ) as in Project #2 (instead of through the constructor)
  *** and extended by the addition of public functions for iterating
  *** through a list.  The hash table (a private member of your class) will be
  *** an array of objects of type OList<Dtype, Ctype>.  Use only public member
  *** functions of the OList<Dtype, Ctype> class to access or modify the chains.

*** In order to rehash as required below, you will need to supply the class
  *** OList<Dtype, Ctype> with functions for iteration through the list.
  These should be public member functions with the declarations
	void Set_Iter( );
	void Advance_Iter( );
        int Valid_Iter( ) const;
	const Dtype& Key_Iter( ) const;
  and should operate so that the code 
	mylist.Set_Iter( );
	while ( mylist.Valid_Iter( ) ) {
          cout << mylist.Key_Iter( ) << ' ';
	  mylist.Advance_Iter( );
        } 
  will send the key values at the nodes to cout, separated by spaces, in
  head to tail order.  This can be accomplished by adding a curr_node 
  pointer as a private member; the iterator functions can arrange for
  this to point to the current node in the iteration, or to 0 if there
  is none.  
  
Supply public member functions Find( ), Insert( ), and Remove( ) for the
  hash table class; these must each take a const Dtype& input and return
  an int.  Each should return 1 if the operation was successful, and 0 if
  not (i. e., return 0 if you try to insert a key already present in the
  dictionary, or find or delete a key which is not present).  Also supply
  a public constructor and a public destructor.  

The constructor should take a float parameter which determines the maximum
  load factor, max_LF, which should take 1.0 as the default value. 
  The initial table size should be 21, and when a table's load factor
  exceeds max_LF then it should be rehashed to a table of size
  2*max{1.0,max_LF}*old_size, rounded up to an integer, so that the
  table size will at least double and so that the load factor after
  rehashing will be at most 0.5.

For the hash function which maps 30 bit integers down to table indices use
  the universal class of hash functions described in Lewis and Denenberg
  pp. 290-291.  Use the prime number N = 2147483659 (the next prime number
  after 2^31), which should be declared as:   const unsigned long long 
  in the header file.  Whenever a table is created or rehashed use rand( )
  to select random parameters a and b to determine the particular hash
  function in the universal class.  These should be random 30 bit integers,
  produced by rand( ) as described in Project #2, Part II.  Be careful to
  call srand(seed) only once, in the constructor; use the last 4 digits of
  your Student ID number for the seed.
 
Your specification should be in a file called xchain.h and your
  implementation code should be in xchain.cc.  Your #include directives
  should be arranged so that any client code with #include "xchain.cc"
  will be able to use the class Xchain< >.

Before going on to part II you should test your Xchain< > class
  using  Find( ), Insert( ), and Remove( ).  Make sure it works
  for int data with Ctype omitted (so that the the default will be used)
  and for C style strings.  For the strings define a
  class Cstring to act as the second template parameter, using strcmp
  for the static member Cstring::lt.  For the static member Cstring::hprep
  use hashing by division as described in Lewis and Denenberg pp. 284-285.
  For the prime number n use 2147483659 as above, i.e., N.  For the base
  (radix) use 255.  The numerical value of a char from the string, say ch,
  should be the result of casting to unsigned int and then subtracting 1, so
  that the range is 0 .. 254 (since the null string can't appear inside a C
  style string). Your definition of the class Cstring should be in xchain.cc.


Part II.  Timing Comparisons

The aim is to create dictionaries of type Xchain<int> 
  and BST<int> (unchanged from Project 2
  except to fix it if necessary) of some large size L and for maximum
  load factors 1, 8, and 64 (for the hash tables) and measure the average 
  time for the sequence of insertions, for successful and unsuccessful 
  Find( )s on those hash tables, and for the same sequence of deletions 
  which empties each dictionary.

A convenient way to accomplish this (as in Project #1) which effectively
  randomizes order in the Xchain<int> hash table is based on a relatively 
  prime pair of integers range and factor.  For i = 1...(range - 1)
  the numbers j = (i*factor) % range will will run through the same
  set of values 1...(range - 1) in a different order.  Just choose
  factor to be somewhere near to (0.62)*range and share no
  divisor with range other than 1 and factor.  For example, 1024 for
  range and 611 for factor.

To use this scheme, start with an empty dictionary and enter 2*j into
  it, as the j values are generated.  Then performing Find(x) for 
  x = 2, 4, ..., 2*(range - 1) will provide range - 1 successful
  Find( )s whereas doing it for x = 1, 3, ..., 2*range -1 will 
  provide range unsuccessful Find( )s.  Finally, deleting 2*j for 
  each j with the values generated in the same order as for insertion
  should result in an empty dictionary again.

When you perform this experiment for a particular maximum load
  factor, choose a value for range which is as large as practical
  (in terms of time and space usage on the workstation) and for
  which the load factor (with all range - 1 elements entered)
  is close to the maximum.

The code for your timing runs should be in timing5.cc, and the
  times you obtain should be tabulated in proj5_results along
  with a discussion of their significance.  Answer the questions "How
  do Xchain<int> and BST<int> compare to each other?" and "What effect
  does the maximum load factor in the hash tables have on the total
  insertion time, the average time for successful Find( )s, the average
  time for unsuccessful Find( )s, and the total deletion time?".
  If you find it more convenient your timing code can be divided among
  several files, named timing5A.cc, timing5B.cc, etc.