CSCI 2720 Data Structures Spring 2004 Programming Project 2 - due Thursday February 12 by midnight. See the General Project Instructions on the course web site for directions which apply to all projects, including this one. The project has two parts, I and II. I. Implement a templated binary search tree class using normal LC and RC pointers, and including a data gathering function implemented by a recursive postorder traversal. II. Perform data gathering tests on random binary search trees of integers. A tree built uniformly at random will be compared to one which is biased to the left and to one which is biased to the right. Part I. Templated Binary Search Tree class Specify and implement a binary search tree class BST, where Dtype is to be the data type for the tree and Ctype contains a static function lt which defines the tree's inorder ordering as follows: given inputs x, and y of type Dtype, the function Ctype::lt(x,y) returns -1 if x should come before y in the ordering, 1 if y should come before x in the ordering, and 0 if x and y are equal. The default (when the second class Ctype is omitted) should be that < is used for comparisons so that x comes before y in the ordering if x < y, which will only work if operator< is defined for Dtype. For the Tnode use normal left and right child pointers LC and RC, and a Dtype data field. See section 13.4 of Stroustrup for a discussion of this method of specifying a comparison function, and default template parameters. Supply the public functions Find( ), Insert( ), and Remove( ) implemented iteratively rather than recursively. Also supply three output functions: Display_In_Order( ), Display_Pre_Order( ), and Display_Post_Order( ). These should be implemented using recursive traversals of the indicated type. The final output function you need to include is Stats( ), which provides the following data concerning the tree: number of nodes; height; ordered height; internal path length. This must be implemented by a single recursive post-order traversal of the tree which collects all the needed data in one pass. Further details are specified below. The only other public functions you need to supply are the constructors and the destructor. Before going on to part II you should test your BST< , > class using Find( ), Insert( ), Remove( ), your 3 Display functions, and your Stats( ) function. Make sure it works for int data and the default comparison, and on C style strings with the strcmp function for comparison. REQUIRED INTERFACE AND FEATURES Be sure to name your class BST and provide public member functions named Find( ), Insert( ), and Remove( ); these must each take a const Dtype& input and return an int. Each should return 1 if the operation was successful, and 0 if not (i. e., return 0 if you try to insert a key already present in the tree, or find or delete a key which is not present). Remove( ) must alternately draw upon the maximum key in the left subtree and the minimum key in the right subtree (starting with the left subtree the first time) when deletion of a node with two children is encountered. Deletion must be real rather than lazy, and must relink the node containing the chosen replacement key rather than copying the key into another node. Each Display function should write the keys to cout, 10 to a line with a single space between each two on the same line. The public data member function must have the following prototype: void Stats (int& nn, int& ht, int& ordht, int& ipl) which returns through the 4 reference parameters the number of nodes, the height, the ordered height, and the internal path length of the tree for which it is called. See p.113 of the Lewis and Denenberg text for the ordered height, and p.184 for the internal path length. Note that the text's external nodes are in the extended binary tree in which external nodes correspond to null RC or LC pointers in our representation; thus our nodes (including leaves) are precisely the "internal" nodes of Lewis and Denenberg. The public data member function should call a private data member function which has an additional input parameter, of type Tnode*. Thus the signature for the private data member function might be void Stats (int& nn, int& ht, int& ordht, int& ipl, Tnode* ptr). Here ptr points to the root of the subtree to be processed, and the private Stats( ) function should be implemented recursively. The public Stats( ) function should call the private one at the root of the tree. For analysis you can calculate the average depth of all nodes by dividing ipl by nn, taking care to cast at least one of the operands to a real number type so as not to perform an integer division, and then save to a variable of that real number type. Your specification should be in a file called bst.h and your implementation code should be in bst.cc. The reason for the implementation restrictions is to illustrate certain points. Iterative versions of Find( ), Insert( ), and Remove( ) are much more efficient than recursive versions. For many applications the key and accompanying data stored in a node would take much longer to copy to a new node than the integers we are working with, and so relinking would be preferable to data copying during a Remove( ) operation. The source code ifaceP2.cc on the course web site can be used to check the public interface for BST< , > as implemented in your bst.h and bst.cc files, as explained in the comments near the top of the file. Part II. Random Tree Data Gathering Experiments The idea is to use srand( ) and rand( ) to generate pseudo-random numbers for creating large random binary search trees. The program randdemo.cc on the course web site shows how to use rand( ) and srand( ). You will find that rand( ) produces non-negative short (two byte) integers. Since they are non-negative the first bit is always 0, so rand( ) produces only 15 pseudorandom bits. This is not enough for one integer for us, so you'll need to create pseudorandom 30 bit numbers with two calls to rand( ): x = rand( ); y = rand( ); y <<= 15; z = x + y; so that z is the 30 bit result. For srand( ), use the last four digits of your Social Security number to set the seed at the start of your tree generation process to individualize your random number sequence. You will run Stats( ) on a series of large trees of type BST, recording the data for each. You will do the same for left and right biased random trees, then explain as quantitatively as you can the growth rates of the height, ordered height, and average depth of a node as a function of the number of nodes in the three types of binary search tree. The trees should have sizes N, 2N, 3N, 4N, and 5N for some large N. Your choice of N will be limited by memory or time. An unbiased random tree of size N can be generated by adding random 30 bit numbers to an empty tree of type BST until N have been added; you'll need to keep count using the return values provided by Insert( ) since some inserts may be trying to add a number that is already in the tree. To create a right biased tree, you can run through an sequence of N values of i in steps of 10, and to each one add a random number mod 3*N. The chices of 10 and 3 can be varied, but you want to bias the tree without making it too near linear. If the tree is too linear then the Stats( ) function will take too long for large N. For left bias, do the same with a decreasing sequence of N values. As for unbiased trees you'll need to ensure that N numbers are actually added; this can be done by making sure that a number is inserted for each of the N values of i, choosing random numbers to take mod 3*N then add to i until the Insert is successful. The code for these tree shape experiments should be in treeshape.cc, or if you find it more convenient in separate files for the different types of bias. The data you obtain should be tabulated in proj2_results, and you should then discuss their significance. GRADING In addition to the generally applicable grading criteria set forth in the General Project Instructions on the course web site, there are two special criteria for Project #2: * Do use Node** in place of Node* for most pointer manipulations; this is discussed in the General Project Instructions on the course web site. * Use iterative rather than recursive algorithms for the Find( ), Insert( ), and Remove( ) member functions of BST.