CSCI 4370/6370 Database Management

Summer 2021 Presentation Slides

Second Half of Course

Jump to   Final Exam

+ See major bullet items for general topical area and corresponding Chapters in Textbook.

+ See sub-bullet items for customized .pptx presentation slides (one for each class day).

+ Each one has (will have) a version with Voice Over (VO) on eLC.
-- Each daily presentation will be divided into multiple .mp4 files that are placed below the corresponding .pptx file on eLC.
-- The .pptx file allows studying at your own pace and skipping around in the presentation.
-- For convenient access, the .pptx files below are copies of the ones on eLC.
-- The powerpoint presentation (.pptx) files and video (.mp4) files are below Dr. Arpinar's videos on eLC.
-- On eLC, select "Content" and then "Video Lectures". Scroll to the bottom and press the "Load More" button.
-- For the first day, the .pptx file is Relational Algebra 1 and .mp4 files are Relational_Algebra_1a, Relational_Algebra_1b and Relational Algebra_1c.

+ Daily Lesson Plan: Monday, Tuesday, Wednesday and Thursday
-- Listen to all the videos for that day (the .mp4 files)
-- Schedule a zoom meeting with your group to jointly go over and discuss the longer .pptx file and related material from the textbook.
-- Attend the twice weekly Review and Q&A Sessions and ask questions.

+ Project 3 (LinHashMap.java only see below) and Final Project (see below)
-- See eLC for details.

+ Twice Weekly Review / Question and Answer Zoom Meetings - Tuesdays and Thursdays at 2:15 pm


See Updated Slides


  1. Relational Algebra (RA) - Operator Based Query Language

    Chapter 8: The Relational Algebra and Relational Calculus

    1. Fundamental Operators: Relational Algebra 1 - see eLC for VO - July 12

    2. Join Operators: Relational Algebra 2 - see eLC for VO - July 13

    3. Division Operator and Example RA Queries: Relational Algebra 3 - see eLC for VO - July 14

  2. Hashing - Usually Faster than B+trees for Point (Equality) Queries

    Chapter 16: Disc Storage, Basic File Structures, Hashing, and Modern Storage Architectures

    1. Files and Disks: Files and Disks 1 - see eLC for VO - July 15

    2. Extendable Hashing: Hashing 1 - see eLC for VO - July 19

    3. Linear Hashing: Hashing 2 - see eLC for VO - July 20

  3. Index Selection - Indexes on Attributes

    Chapter 17: Indexing Structures for Files and Physical Database Design

    1. When to Add Indexes: Index_Selection 1 - see eLC for VO - July 21

    2. Index Selection Example: Index_Selection 2 - see eLC for VO - July 21

  4. ISAM - precursor to B+trees

    Chapter 17: Indexing Structures for Files and Physical Database Design

    1. Indexed Sequential Access Method (ISAM): ISAM 1 - see eLC for VO - July 22

  5. B+trees - General Purpose Data/Index Structure for Databases

    Chapter 17: Indexing Structures for Files and Physical Database Design

    1. Structure of B+tree: B+tree 1 - see eLC for VO - July 26
      Structure of B+tree: B+tree 1a
      See updated BpTreeMap.java that corresponds to the VO slides.

    2. B+tree Inserts: B+tree 2 - see eLC for VO - July 27
      B+tree Inserts: B+tree 2a
      May study B+tree Deletes on your own - will not be on test.

    3. B+tree Performance (Select): B+tree 3a - see eLC for VO - July 28
      B+tree Performance (Join): B+tree 3b - see eLC for VO - July 28

  6. Query Processing and Optimization - SQL -> RA -> Query Plan Optimization -> Plan Execution

    Chapter 18: Strategies for Query Processing

    1. Query Processing and Optimization (tuples): Query Processing 1 - see eLC for VO - July 29
      Query Processing and Optimization (blocks): Query Processing 1b - see eLC for VO - July 29

    Outline: Query Processing and Optimization

  7. Final Project Presentations - Present Database Design and Software Architecture, Give Demo
    1. Day One Zoom Presentations: Monday, August 2

    2. Day Two Zoom Presentations: Tuesday, August 3

    3. Day Three Zoom Presentations: Wednesday, August 4


  8. F I N A L - E X A M

    - Final Exam - Thursday, August 5-6, 2021 -- on eLC

    - Exam start and end times: August 5, 2021, noon to August 6, 2021, noon

    - Open Book and Notes

    - Material to be Covered (6 Questions):

    - Must show work (steps/logic taken to produce your answer)

    - the e.g. (for example) should only be taken as an example of a possible question

    1. SQL: Given English write SQL Queries
      • Moderate: e.g., List the names of customers who live and bank (have deposit accounts) in the same city
      • Challenging: e.g., List the names of customers who only have loans where they have deposit accounts.

    2. Relational Algebra
      • Relational Algebra (RA) Operators: e.g., Explain why the intersection operator is not fundamental.
      • Relational Algebra Expressions as Queries: e.g., List the names of customer with deposit accounts at all Athens branches.

    3. Files and Disks
      • Files, e.g., What types of files are suitable for database storage? Explain.
      • Disks, e.g., Compute the block access time for a "give disk".

    4. Hash Index
      • Structure: e.g., How does the structure of "Linear Hashing" compare to "Hashing with Separate Chaining"?
      • Inserts: e.g., Insert the following keys into a Linear Hash Table.
      • Performance: e.g., What is the Best, Expected and Worst Case Performance for Linear Hashing (give Exact and Asymptotic results)

    5. B+Tree Index
      • Structure: e.g., How does the structure of a "B+Tree" compare to "ISAM"?
      • Inserts: e.g., Insert the following keys into a B+Tree Index.
      • Performance: e.g., Compare the number of block accesses (nba) of Select vs. Indexed Select (using a B+Tree) for ...

    6. Database Design - similar to Final Term Project
      • UML Class Diagram
      • BCNF Decomposition Algorithm - Create a BCNF Decomposition Tree
        1. Specification (R, F) where R is the set of all attributes and F is set of all functional dependencies
        2. Create Root Node containing all attributes R
        3. Find a Functional Dependency (FD) that violates BCNF (i.e., LHS is not a superkey)
        4. Place this FD X --> Y below the node
        5. Make a left child node that contains attributes XY
        6. Make a right child node that contains attributes of parent node - Y
        7. If either child node is not in BCNF, Repeat Step 2
        8. Form a Table for each leaf node in the Decomposition Tree
      • 3NF Synthesis Algorithm - Show Steps
        1. Specification (R, F) where R is the set of all attributes and F is set of all functional dependencies
        2. Apply the miminal cover algorithm on F to produce a minimal cover F'
        3. Merge all FD X --> Y in F' that have a common LHS X into a single FD (e.g., AB --> C, AB --> D merge into AB --> CD)
        4. For each FD, form a Table (e.g., AB --> CD leads to Table Ri(ABCD)
        5. Remove any subset tables that are fully contained in some other table
        6. Check for losslessness: requires closure (Ri) = R for some i
        7. If not lossless, add a new Table that consists of a Global Key

    Example: BCNF Decomposition Tree

    R = { A, B, C, D }
    F = { A --> BC, C --> D }
    Answer:
            [R = ABCD ]
            /   C --> D   \
    [CD]                 [ABC]
    Therefore R1 = CD, R2 = ABC

    Example: SQL and Relational Algebra

    List AB where D = 5
    SQL Answer:
    select s.A, s.B
    from R1 r, R2 s
    where r.D = 5 and r.C = s.C

    Direct Translation to RA Answer:
    proj_A, B (sel_D = 5 and R1.C = R2.C (R1 x R2)

    use proj for Project, sel for Select, x for Cartesian Product, * for Join

    Example: Query Processing and Optimization: Step 1

    Answer for tuples:
    proj_A, B                                 ouput size = formula = ? tuples
    sel_D = 5 and R1.C = R2.C     ouput size = formula = ? tuples
    R1 x R1                                   ouput size = formula = ? tuples
    total = ?

    Answer for blocks:
    proj_A, B                                 reads = formula = ?, writes = formula = ? blocks
    sel_D = 5 and R1.C = R2.C     reads = formula = ?, writes = formula = ? blocks
    R1 x R1                                   reads = formula = ?, writes = formula = ? blocks
    total = ?


  9. Project 3: Index Structure - Linear Hashing - Due Thursday, July 22nd at midnight

    1. Use indices (TreeMap (in Java), HashMap (in Java) and LinHashMap (your code)) to speed up the processing of Select and Join. Indices must be integrated and used in the Table.java.

    2. Must use the following starter code: LinHashMap.java

    3. Hash Table:
      private final List < Bucket > hTable;
      hTable = new ArrayList <> ();

      Also see DirectMap.java

    4. Linear Hashing may be implemented as an extended version of Hashing with Separate Chaining. It will split home buckets to maintain the load factor (alpha) within desired bounds. When the load factor gets too high, the home bucket at the split pointer will be split into the original bucket and a new bucket at the end of the array list. The split pointer will then be incremented by one.

    5. After the split, the load factor will be in the acceptable range and now the new record (key-value pair) is to be inserted. If there is no room in the home bucket, make an overflow bucket and link the home bucket to this overflow bucket. Place the new record in this overflow bucket.

    6. The fact that home buckets are split implies that two hash functions are required: a low resolution hash function h and a high resolution hash function h2.

    7. Do not make a copy of the record/tuple, but simply use a reference to the tuple in the Table.

    8. Compare the speed of (a) NoIndex, (b) TreeMap, (c) HashMap and (d) LinHashMap. Plot the speed/performance in terms of response time in ms vs. number of tuples. Will require the use of TupleGenerator.java, TupleGeneratorImpl.java and TestTupleGenerator.java.

    9. Compare these four for (i) Select Operations: "σ id = v (Student)" and (ii) Join Operations: "Student join id = studId Transcript".

    10. Use Java's "System.nanoTime ()" method. Collect six samples. Throw out the first and average the remaining five. During the first pass through the code the JIT is used to translate byte code to machine code, which slows down execution.


  10. Final Term Project: Database/Web Application

    1. A Database/Web Application is required: a project needs to use MySQL as a database backend. Should have four to seven tables.

    2. Real-world data is required: either collect from web, including social media, other databases or generate through human-activity. No toy-data is allowed.

    3. Large data is required: a project needs to use a significant number of tuples. Minimum number of tuples must be 1000. In some situations, computer-generated data will be permissible.

    4. A project needs to use database design in UML, normalization process and indexing capabilities of the database backend.
      1. Design 1: UML Class Diagram - only show attributes and not methods
      2. Design 2: BCNF Decomposition - show decomposition tree
      3. Design 3: 3NF Synthesis - show major steps in algorithm

    5. A Java/JVM based back-end for database access is required (e.g., JDBC, Ebeans or Hibernate).

    6. A JVM-based Web Framework (e.g., Play 2 or Spring Boot) is required.

    7. Application may be coded in any High-Profile JVM-based Programming Language.

    8. Create a Python script for TA to run your project including setting up the database.

    9. Final Term Project Two-Page Proposal: Send in as soon as possible for feedback and approval:

      Include Title, Brief Description, Preliminary UML Diagram, Database Access (JDBC, Ebeans or Hibernate), Web Framework

    10. Grading:

      1. UML Class Diagram 10 points
      2. Translation to Relational Model 10 points
      3. BCNF Decomposition Algorithm 10 points
      4. 3NF Synthesis Algorithm 10 points
      5. Compare and Pick One 10 points
      6. Show schema.sql File 10 points
      7. System Architecture Diagram and Components 10 points
      8. Demo 30 points

    Secure Passwords used for Logins (Provided +1 point, Not Provided -1 point):
    Salted Password Hashing - Doing it Right