For this assignment, you must implement an index manager to allow fast searches in a collection of index keys maintained in a B+-tree index. The system should support simple navigational functionality allowing the users to insert individual keys, search them, as well as to navigate through the keys in one direction (in the increasing order of keys). Before doing this assignment, study the B+-tree algorithms described in the lectures.
As in the previous assignment, you should assume single-user environments. You need not be concerned with concurrency, recovery, buffering, record management, or many other core database issues. Furthermore, you need not worry about deletions of B+-tree entries. Your B+-tree should be kept in a single file divided into fixed-size blocks containing fixed-size <key, pointer> pairs in the internal nodes, and (key, value) pairs in the leaves. For the purposes of this assignment, you should set the block size to 64 bytes. The fanout of these blocks should be 3.
The index keys and the pointers should be 4-byte long unique integers. You should assume that all keys in leaf-level B+-tree entries are positive, unique integers (-1 is reserved to mean "no key" or "no pointer."). The pointers of the index entries in the internal nodes of the B+-tree should be 4 bytes long unsigned integers. The values of each leaf-level B+-tree entry should contain the (positive integer) specified by the user.
To test the operation of the index manager, your program should also implement a Program Shell with a simple user interface. The shell must execute individual user requests (e.g., insert a key or fetch a key) by invoking appropriate functions of the index manager's API.
On or before the due date, you should submit through Blackboard the
source code (as many files as you have), the executable code called "assign3.exe",
a test-sample database called "assign3.db" with enough test data (20-30 keys) to make it easy for us to verify that your system works correctly. You should also include a README file indicating any special instructions. Please
follow the submission instructions.
Program Shell
The program shell should support a simple menu-driven user interface that presents the user the following specific functions:
1. Open B+-tree - Given the name of the B+-tree file, open the file and initialize the necessary in-memory structures. If the file with the given name does not exist, create one and initialize it. (This is the first function we will use to test your program.)
2. Insert key (key, value) - Insert a new key into the B+-tree maintaining the natural order of keys. The newly inserted entry becomes the current one. Display the number of accessed pages, including the ones that have been created or modified. If the key already exists in the database, return an error.
3. Get first key - Find the first key in the natural order and display the key and the corresponding value on the screen. Make the accessed entry the current one. If there are no entries in the B+-tree, display an appropriate message.
4. Get next key - Find the next key following the current one in the natural order and display the key and the corresponding value on the screen. If the current entry is the last one in the natural order, display an appropriate message and stay positioned on the current entry. Otherwise, if there is a next entry, make it the current one.
5. Fetch key (key) - Find the entry with the given search key. If the key is not in the B+-tree, position on the next key in the natural order and make the corresponding index entry the current one. If the next key does not exist, position on the last entry and make it the current one. Display the number of accessed pages as well as the key of the current entry and the corresponding record pointer .
6. Show header page - Fetch the contents of the header page (see
the implementation of the index manager) and display it on the screen in
a legible format so that
the user can understand the meaning of individual fields. The current
entry should stay the same as before this function.
7. Get first page - Fetch the contents of the first index page in the file (in the physical order in which the pages are stored in the file) and display it on the screen (including the items in the header as well as all index entries/pointer stored in that page). For uniformity, for the key/pointers or key/values, display "{NEWLINE}key: k1, k2, ..., k(f-1) {NEWLINE} value: v1, v2, ..., vf{NEWLINE}." It will be easier to check the contents this way. The accessed page becomes the "currently scanned page". However, the current entry should remain the same as before this call. If there are no index entries in the page, display appropriate message.
8. Get next page - Fetch the contents of the next index page (either leaf or interior page) immediately following the "currently scanned page" (in the physical order in which the pages are stored in the file) and display it on the screen. If the "currently scanned page" is not known, display appropriate message on the screen. If the "currently scanned page" is the last page in the physical order, display appropriate message and stay positioned on the last page. Otherwise, if there is a next page in the physical order, it should become the "currently scanned" one. The current entry should remain the same as before this call.
9. Naive search (key) - Find a key "naively." From the root, follow left pointers until the leftmost leaf. Then following the sibling pointers at the leaf to find the key.
10. Bulk insert (filename) - Insert a text file containing keys.
11. Show Split/Merge Debug- During a split/merge, tell the user which blocks are being merged/split into which block(s). For testing purposes.
10. Exit - terminate the program after flushing all changes to disk and closing the B+-tree file.
For uniformity of programs (and to facilitate our interaction with your system), the user interface should be a simple menu in which each function is numbered and listed. We should be able to invoke an appropriate function by typing the corresponding number. For our convenience, please keep the order of the functions (as well as the corresponding numbers) as indicated above!
Your program should maintain the concepts of the current entry and currently scanned page. For the current entry, you should allocate (either in the program shell or in the index manager) a buffer to keep the current entry and an indication in which page (as well as where in the page) the current entry resides. The buffer should be filled each time a new index entry becomes the current one. The "currently scanned page" is used only in connection with the functions Get first page and Get next page. Note: this need not necessarily be the page containing the current entry.
For all read/write operations on the B+-tree, you must return the number of blocks accessed to complete it.
API of the Index Manager
The API (application programming interface) of the index manager should provide at least the following functions:
1. CreateBTree - create and initialize the file to hold the B+-tree.
2. OpenBTree - open the file containing the B+-tree.
3. InsertKey - insert a new entry with the given key.
4. FirstKey - return the first key in the natural order and the corresponding record pointer.
5. NextKey - return the next key in the natural order and the corresponding record pointer.
6. FetchKey - find the entry with the given search key.
7. GetHdrPage - fetch the contents of the header page.
8. FirstPage - get the contents of the first index page in the physical order.
9. NextPage - get the contents of the next index page in the physical order.
10. CloseBTree - flush all changes to the disk and close the B+-tree file.
The precise syntax of the index manager' API is up to you. However, the semantics of functions 3 through 9 should match the definitions of the corresponding functions of the user interface.
Implementation of the Index Manager
Your B+-tree should be maintained in a single file, logically divided into 64-byte blocks. The first block (block 0) in the file should contain a header page, while other blocks should contain B+-tree pages (a leaf page or an interior page). Initially, the file should have the header page and exactly one empty B+-tree page.
The header page should contain some administrative information (see below) about the B+-tree. For the purposes of this assignment, you should keep the header page in memory throughout the execution of your program (read it when the file is opened for the first time, and write it back to the file just before the program terminates).
The header page should keep at least the following items:
1. Offset of the root page of the B+-tree ("Root"). This item is used when searching the tree in order to locate the root page (the first page to be searched). It is updated each time the root is split and a new root page is added to the file.
2. Offset of the first leaf page of the B+-tree ("FirstLeaf"). This item is used by the FirstKey procedure to locate the first key in the natural order. Depending on how you implement splits of the root page, this may not be necessary (i.e., the first leaf page may always be at block offset 1).
3. Number of B+-tree pages ("NoBPages"). This item contains dual information: (a) the actual number of B+-tree pages and (b) a logical indication of where the end of the file is. The item is updated each time a new B+-tree page has been created (i.e., due to a split).
4. Number of levels in the B+-tree ("NoBLevels"). This item indicates how many levels a B+-tree has. It can be used when searching the tree to determine when you encounter a leaf page. The item is updated each time the root page is split.
5. Number of keys ("NoKeys"). This item indicates the number of keys in the B+-tree. It is used during insertion to generate "record pointers" of the leaf-level entries. The item is updated each time a new key is inserted in the B+-tree.
You are free to customize the contents of the header page to your needs. Note, block offsets are logical, i.e. relative offsets (i.e., block 0, block 1, block 2, etc). Multiplying each relative offset by 64 gives an absolute offset of the corresponding block in the file.
Each B+-tree page should contain a page header and a number of 8-byte <key, pointer> index entries. Note, each page may have some unused space to make sure that pages are exactly 64 bytes long. For the purposes of this assignment, the page header would contain at least the following:
1. Offset of the next B+-tree page in order ("NextPage"). This is used only in leaf-level pages to indicate offset of the next page, i.e. the page having the next key in the natural order following the last key in the given page. This information is updated each time a leaf page is split.
2. Number of entries in a B+-tree page ("NoEntries"). This is a count of entries currently residing in a page. This item is updated each time a new entry is inserted in the given page.
You can customize the page header to your needs.
Following the page header, each page has at most M 8-byte entries, where M is the page capacity (maximum number of entries that can fit in a page). M is the same for all B-tree pages, and it can be calculated by hand as a value obtained after truncating (64 - size of the page header)/8.
As soon as an insertion overfills a page (i.e., NoEntries becomes M), you should split a page. The last slot in a page is often called a "split entry". Whenever this slot becomes occupied by an index entry, a split operation is triggered. This simplifies the split operation.
Keep in mind that there are differences between splitting a leaf-level page and an interior page in a B+-tree!
Lots of extra credit for this project. Please let me know if you are trying to implement any of these, so I can prepare test cases, and give you any special instructions:Good luck!