The primary goal of the project is to implement some of key concepts in distributed and parallel databases systems. For example operations like fragmentation, parallel sort, range query etc. This project is done as part of CSE 512 Distributed and Parallel Database Systems taught by Mohamed Sarwat
These concepts are built upon open source relational database postgres. I have used python for programming and psycopg as database driver for postgres. You can find getteting started guide for psycopg here.
The project covers 3 mains concepts
- Data fragmentation acorss partitions. (Sharding)
- Query processor that accesses data from the partitioned table.
- Parallel sort and parallel join algorithm.
In centralized database sysytems, all the data is present in single node whereas in distributed and parallel database systems data is paritioned into multiple nodes.
It involves building a simplified query processor that accesses data from the partitioned table. As part of this two queries were implemented RangeQuery() and PointQuery().
RangeQuery() takes input as range of attribute and returns the tuples that come along with given range from fragmented partitions done in first step.
PointQuery() takes input as specific value of attribute and returns all the tuples having the same value of attribute from gragmented paritions.
This task involves implementation generic parallel sort and join algorithm.
In case you like this utility or you find fun working with this project then feel free to contribute. For contributing you just need working knowledge of python, postgres & bit about distributed database concepts.
Some initial ideas would be adding few more queries in query processor .!
If you find any issue, bug, error or any unhandles exception, feel free to report one