distributed-database

The primary goal of the project is to implement some of key concepts in distributed and parallel databases systems. For example operations like fragmentation, parallel sort, range query etc. This project is done as part of CSE 512 Distributed and Parallel Database Systems taught by Mohamed Sarwat

These concepts are built upon open source relational database postgres. I have used python for programming and psycopg as database driver for postgres. You can find getteting started guide for psycopg here.

The project covers 3 mains concepts

Data fragmentation acorss partitions. (Sharding)
Query processor that accesses data from the partitioned table.
Parallel sort and parallel join algorithm.

Data Fragmentation

In centralized database sysytems, all the data is present in single node whereas in distributed and parallel database systems data is paritioned into multiple nodes.

Query Processor

It involves building a simplified query processor that accesses data from the partitioned table. As part of this two queries were implemented RangeQuery() and PointQuery().
RangeQuery() takes input as range of attribute and returns the tuples that come along with given range from fragmented partitions done in first step.
PointQuery() takes input as specific value of attribute and returns all the tuples having the same value of attribute from gragmented paritions.

Parallel Sort & Join

This task involves implementation generic parallel sort and join algorithm.

Contribution

In case you like this utility or you find fun working with this project then feel free to contribute. For contributing you just need working knowledge of python, postgres & bit about distributed database concepts.
Some initial ideas would be adding few more queries in query processor .!

Issues

If you find any issue, bug, error or any unhandles exception, feel free to report one

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
README.md		README.md
datapartitioning.py		datapartitioning.py
parallel_join_sort.py		parallel_join_sort.py
queryprocessor.py		queryprocessor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

distributed-database

Data Fragmentation

Query Processor

Parallel Sort & Join

Contribution

Issues

About

Releases

Packages

Languages

Prashant47/distributed-database

Folders and files

Latest commit

History

Repository files navigation

distributed-database

Data Fragmentation

Query Processor

Parallel Sort & Join

Contribution

Issues

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages