-
Notifications
You must be signed in to change notification settings - Fork 7
/
answer_lab1.txt
120 lines (92 loc) · 5.62 KB
/
answer_lab1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
### SimpleDB
A dumb DBMS implemented in Java
### Developer
Xi Han 504136747 [email protected]
Runhang Li 204134617 [email protected]
### Logistics
#### An overlook
Database provides a group of static objects: Catalog, Buffer pool, log
Catalog Buffer pool
| |
Tables ConcurrentHashMap<PageID,Page>
|
DbFile, TableName, TablePrimaryKey
|
File,TupleDesc,maxPageNo
|
TDitem
#### Design decisions
We start our project by implementing some basic data structures, which are
Fields, Tuples and TupleDesc. Since TupleDesc will be used to describe
the type and the name of each field inside a tuple, we start implementing
TupleDesc. Each field inside a tuple is being described by TDItem, which
is a component inside TupleDesc. We decide that TDItem should have private
variable fieldType and fieldName and we then write some public function like
constructor and toString for TDItem.
TupleDesc should contain a vector of TDItem. After figuring our private
variables in TupleDesc, it is then quite easy for us to implement those
public functions inside TupleDesc.
After implementing TupleDesc, we can now implement Tuple. Tuple should
consist of one TupleDesc which will describe this tuple, one vector of Field
that stores a group of data and a record id that identifies this tuple. Then,
we just need few lines to implement for public methods.
BufferPool serves as a cache in this DBMS. We saw in the template that this
class needs java.util.concurrent.ConcurrentHashMap. Therefore, we decide
the following private variable inside Bufferpool class: a pool of type
concurrent hash map that maps "PageID" to "Page" and an integer that
records the maximum page that this bufferpool can hold. It is quite easy
to implement constructor, getpagesize and setpagesize so I will focus on
introducing how we design getpage method. For getpage method, we try to
retrieve page from hashmap. If we cannot find corresponding page in
hashmap, hashmap will return null pointer, in which case we will try
to put <pageid, page> in map unless we already reached the maximum
number of page we can insert.
Then, we try to implement HeapPageId, RecordID, HeapPage together.
First, we decide to implement RecordID. From the constructor, we can
know that RecordID contains two variable: one PageID that identifies
the page that tuple stays in and one Tupleno that identifies the position
in which this tuple stays in the corresponding page. It is easy to implement
constructor and some trivial public methods. However, the hashcode() function
requires some consideration. As professor suggested on Piazza, we decide to
take the most significant 16 bits of the hashcode of Pageid and then
concatenate it with the least significant 16 bits of tupleno to create
hashcode for RecordID.
HeapPageId serves as a unique identifies for HeapPage object. Again, from
the constructor, we can know that HeapPageId holds two private variables:
tid which identifies the table in which the page resides in and pid which
identifies the position in which this page resides in the corresponding
table. We then write some public methods and finish hashcode() method by
concatenating the most significant 16 bits of pid with the least 16 bits
of the hashcode of tid to create hashcode.
Next, we implement HeapPage. After careful consideration, we believe HeapPage
should have private variables: a HeapPageID which identifies this HeapPage,
a TupleDesc which describes the tuples this HeapPage stores, an array of tuples,
number of slots that will be used to store tuples, and an array of bytes as a header
to indicate if a slot is being used. We can easily implement getnumtuples
and getheadersize by reading hints in skeleton code. Then, we try to implement isSlotUsed.
Think header as a table. When we divide argument "i" in isSlotUsed function by eight,
the result we get is the row number and the mod we get is the column number. In this way,
we can locate the position of the bit we want to examine. We then do some bit
manipulation to extract the bit we want and compare it with zero to see if this slot
has been used. Then, for getNumEmptySlots, we can just loop through the header to get the
number of bits that have been used. For iterator(), we return an iterator object in which
we write overridding methods. Finally, we implemented HeapFile. HeapFile should contain
a File object, a tupledesc, an integer that records the maximum number of pages, which
should be file.length()/BufferPool.getPageSize(). Inside HeapFile class, there
are some functions that require careful consideration. The first one is readPage.Readpage
will read a stream of data from file. The starting position when reading is specified
by pagesize()*pageNo and from that starting position, we read a continuous stream of data.
Specifically, we use RandomAccessFile class provided by Java SDK to do reading, as specified
in the spec. The most difficult one is DbIterator() method. We implement several
overridding methods so that this DbIterator can iterate the tuples specified by
tableid and pageid.
Finally, we implement the SeqScan. SeqScan class should have private variables:
tableid, tablesAlias, and a DbIterator. After implementing DbIterator, it is quite
easy for us to implement methods related to iterator in SeqScan class by just calling
corresponding methods on iterator variable in this class.
#### Any changes we made to the API
None
#### Any missing or incomplete elements in code
None
#### Hours spent
Four