CS 525: Advanced Database Organization
Programming Assignment #1: Implementing a simple record manager
Soft Deadline: February 20, 2004
Hard Deadline: February 27, 2004

For this assignment, your task is to implement a "personal" (single-user) record manager for a simple application to store and fetch its data. The system will support navigational functionality allowing the users to insert, delete, and update individual records as well as to browse through the records in both directions. The application is computer inventory for a certain company. The system should be implemented  in C or C++ for Windows 98 or above. The description here contains enough detail so that you can implement the system relying on the problem statement and your programming skills.

Keep the implementation as simple as possible, but within the bounds of good programming logic and style. Be warned, however --- despite many simplifying assumptions, this assignment is not trivial! In particular, you must assume that the database for the computer inventory can be quite large, e.g. tens of thousands records. You may not assume that the entire database will fit in main memory! But you need not be concerned with concurrency, recovery, distributed processing, buffering, indexing, or many other core database issues that we will examine later in the course. The entire database should be maintained in a single file divided into fixed-size blocks (pages) containing fixed-size records. Except for one page (which we will call  the "header page"), individual pages of the file should be read and written on demand, one at a time.

Grading will be done based on the correctness of your program, readability of your code, and the efficiency of your implementation. On or before the due date, you should submit through CourseInfo the source code (as many files as you have), the executable code called "assign1.exe", and a test-sample database called "assign1.db" with enough test data (20-30 records) to make it easy for us to verify that your system works correctly.

Application Issues

Each record in the database should contain the following information:

Field No     Field Name             Type         Size        Comment
1                 Computer_ID           string         10 bytes      unique for each computer
2                 Computer_Model     string         10 bytes     e.g. PC
3                 Location                   string         10 bytes     where computer is located
4                 Person_In_Charge    string         20 bytes     "owner" of the computer
5                 Phone                       string         10 bytes     phone of the "owner"

Note that all fields are just fixed-length character strings. In principle, computer IDs uniquely identify the computers. For simplicity, we will also assume that "persons in charge" have unique names, each computer belongs to a single person, and each person has only one computer. However, in your implementation, you should not worry about the enforcement of unique values --- assume that the users will take care of that.

A sample database with 6 records may look as follows:

<C10; PC; 234SB; A. N.; 222-3333>
<C77; PC; 123SB; S. B.; 333-4444>
<C32; PC; 155SB; G. C.; 444-5555>
<C91; Sparc; 105MB; M. F.; 555-6666>
<C54; Sparc; 222MB; E. W.; 999-0000>
<C13; PC; 176SB; B. D.; 777-8888>

For the purposes of this program, the order of records in the file is irrelevant; it can be determined by your implementation. We call this "physical order" of records in the database. Typically, this will be the order in which the records are inserted in the database, but  the deletions may remedy this order.

Implementation Issues

Your system should have two major modules: an application manager and a record manager. The task of the application manager is to accept a request from the user and deliver it to the record manager by invoking an appropriate function or functions of the record manager. The record manager carries out the requested task possibly returning a record. Following this, the application manager displays the result on the screen.

Implementation of the Application Manager

Through a simple menu-driven interface, the application manager should present to the users the following capabilities:

1. Open database - Open the database and initialize the necessary in-memory structures. In case the database file does not exist, create one and initialize it. (This is the first function we will use to test your program. In the process, we will use the data file "assign1.db" that you have prepared for us.)

2. Get first record - Fetch the first record in the database in the physical and display it on the screen. Make the fetched record the current one. If there are no records in the database, display an appropriate message.

3. Get next record - Fetch the next record immediately following the current one (in the physical order in which the records are stored) and display it on the screen. If the current record is not known, display an appropriate message on the screen. If the current record is the last one in the physical order, display an appropriate message and stay positioned on the last record. Otherwise, if there is a next record, it becomes the current one. 

4. Get previous record - Fetch the previous record in the database (in the physical order) and display it on the screen. If the current record is not known, display an appropriate message on the screen. If the current record is the first one in the physical order, display an appropriate message and stay positioned on the first record. Otherwise, if there is a previous record, it becomes the current one. 

5. Insert record - Insert a new record into the database. The inserted record becomes the current one. 

6. Delete record - Delete the current record from the database. Display an appropriate message if the current record is not known. Otherwise, the next record following the deleted one becomes the current record. If the current record is the last one in the physical order, position on the previous record and make it the current one. 

7. Update record - Replace the current record with a new one. Stay positioned on the current record. 

8. Show header page - Fetch the contents of the header page (see the implementation of the record manager described below) and display it on the screen in a readable format so that the user can understand the meaning of individual fields.

9. Get first page - Fetch the contents of the first data page in the database file (in the physical order in which the pages are stored in the file) and display it on the screen (including the items in the header as well as all records stored in that page). The accessed page becomes the "currently scanned page" (note, this is different from "current page"). But, the "current record" and "current page" should remain the same as before this call. If there are no records in the database, display an appropriate message.

10. Get next page - Fetch the contents of the next data page immediately following the "currently scanned page" (in the physical order in which the pages are stored in the file) and display it on the screen. If the "currently scanned page" is not known, display appropriate message on the screen. If the "currently scanned page" is the last page in the physical order, display appropriate message and stay positioned on the last page. Otherwise, if there is a next page in the physical order, it should become the "currently scanned page". The "current record" and "current page" should remain the same as before this call. 

11. Commit changes - This function makes sure that all the changes you have made so far are written to disk.

12. Exit - Terminate the application after flushing all changes to the disk and closing the database.

For uniformity of programs and to facilitate our interaction with your system, the user interface should be a simple menu in which each function is numbered and listed. We should be able to invoke an appropriate function by typing the corresponding number. For our convenience, please keep the order of the functions (as well as the corresponding numbers) as indicated above!

For simplicity, you should assume that individual fields of records are stored in the database as NULL-terminated character strings of the specified length. (Note, all fields are of fixed length; therefore, the records are also of fixed size.) This organization enables application manager to access each individual field through some simple offset calculation. Furthermore, updates can never extend a record. But fixed-sized records are not ideal. They typically incur high storage overhead and performance penalties.

Note also that your program should maintain the concepts of a current record and a current page. These can be undefined, but only when the database is empty or before the user begins processing the database. To support these concepts, we recommend that you keep in memory a "current-record buffer". The buffer should be large enough to hold the current record as well as an indication whether the current record is undefined and, if not, where the record is (i.e., in which block and where in the block). Note that the current-record buffer serves a dual purpose: to hold current record and indicate the current page. The buffer should be filled each time a new record becomes the current one. In addition to the current record buffer, you may find it useful to pre-allocate few more record buffers to carry out certain functions, e.g. insertion or update.

Your program should also maintain information about the currently scanned page. This information is used only in connection with the functions Get first page and Get next page. Note: the currently scanned page need not necessarily be the page containing the current entry!

Implementation of the Record Manager

The API (application programming interface) of the record manager should contain at least the following functions:

1. CreateStore - create and initialize the file to hold the database.
2. OpenStore - open the file containing the database.
3. FirstRec - get first record in the physical order.
4. NextRec - get next record in the physical order.
5. PriorRec - get previous record in the physical order.
6. InsertRec - insert a new record.
7. DeleteRec - delete current record.
8. UpdateRec - replace the current record with the new one.
9. CloseStore - make sure all changes are propagated to disk and close the database.

These functions are used internally by the application manager to perform a function requested by the user. Note that the user interface differs slightly from the record manager's API. You may find it convenient to "extend" the record manager's API with few more functions. The precise syntax of the record manager's API is up to you. However, the semantics of functions 3 through 8 should match the definitions of the corresponding functions of the user interface.

As indicated earlier, the entire database should be maintained in a single file. As usual in actual DBMS implementations, the records should be grouped into fixed-sized blocks (pages). In other words, the file is logically divided into fixed-sized pages containing data records. Typically, the page size is either 1K, 2K, 4K or more bytes, but for this assignment fix the block size to exactly 200 bytes. This makes it easier to test your implementation.

Transfer of data between memory and disk should be block-based rather than record-based. This means that the record manager, when accessing a record, should first read from the file the entire block (page) in which the record resides, place the page in a memory buffer, and fetch the record there. Similarly, during an insertion, deletion, or an update, the page is first modified in memory, and then the entire page is written to the file. This arrangement requires that your record manager keeps one or more "page buffers" in memory at all times.

The database file should have a header page (the first one in the file), different from regular data pages containing actual records. For this database, the header page serves as a simple data dictionary. It should contain some "anchor pointers" to data pages (keep in mind, these are not regular pointers of C/C++ programs to main-memory objects, but offsets in the file!) as well as some statistical information about the database (e.g., how many records and pages it has) used by the record manager. If necessary, you may also keep here more data, e.g. about the internal organization of records ( how many fields they have, what is the size of each field, etc.). The header page may be heavily underutilized, but it should be of the same size as data pages. we recommend that you keep the header page in memory all the time while your program is running (read the page once when you open the database, and write it back to the database file before exiting the program).

The internal organization of data pages is up to you. But, in deciding how the data pages should be internally organized, pay attention to how you delete a record so that the deallocated space can eventually be reused.  This brings us to an important problem of storage allocation and free-space management, which is a complex problem that will be examined later in the class. But, for this assignment, we recommend that you use a simple schema. For example, to delete a record, you can just overwrite it by moving  in its place the last record in the file.

Implementation of the Storage Manager

In order to access the disk, you should implement an interface. The interface should have at least two functions:
  1. Get Block - Read a block from the disk, and place into buffer. Of course, if there's already a block in the buffer, write that block back to disk.
  2. Put Block - Write the block in the buffer back to disk.
Note that you should only need two block buffers. (Know why?)

Good luck!