###Simple Hashing###
A memory-efficient implementation of the Simple Hashing algorithm. Takes 16-byte elements as input from a file. Writes output to a separate file.
Buckets are files where the elements are temporary stored. Memory queues are built for every bucket, so that we don't have to write down every element after every read operation. Elements are read from input file sequentially and saved in the queue buffer according to its hash value. If queue buffer is full it will be written to a physical entity of the bucket(file). After all elements are read from the input file all remaining bucket queues are going to be dumped to their files.
Because of the sorted order of buckets, we now can build simple hashing table using only one bucket at the time, so that it fits in RAM.
Install:
sudo make install
Clean:
sudo make clean
Remove:
sudo make remove
- libglib2.0-dev
- lpsi-util
- libssl-dev
Usage:
psi-simple-hashing
- -1 16-Byte seed 1 Hash seed
- -2 16-Byte seed 2 Hash seed
- -3 16-Byte seed 3 Hash seed
- -p "path to data" Path to the input raw data
- -s "path to buckets" Folder where to place buckets
- -b number of buckets
- -q queue buffer size Buffer size for bucket queue
- -i read buffer size Buffer size of chunk to be read from input file
- -t threads Number of used threads
- -d table size multiplier Size of hash table is calculated by multiplying number of elements by set value
- -f fixed table size Fixed table size value. Stronger than table size multiplier
- -z path_result Path to the output file
- -r 1 Reduction of elements 16 to 10 bytes. Bin flag is not removed, so +1 byte.