Unpackage code and setup python environment
> tar xvzf dyann.tar.gz
> cd dyann
> /apps/python/3.9.X/bin/python3 -m venv env
> source env/bin/activate
> pip install -r requirements.txt --no-cache-dir
Quick test
> python download.py data=[datacol_quick]
> python run.py data=[datacol_quick] algo=[linear,hnsw]
> python plot-pareto.py data=[datacol_quick] algo=[linear,hnsw]
> python plot-algo.py data=[datacol_quick] algo=[hnsw]
Preload all datasets and pregenerate all groundtruth (could take hours, ensure at least 30GB space)
> python download.py data=[datacol,datacol_lerp,datacol_efreq,datacol_esfreq]
> python download.py data=[featlearn,featlearn_lerp,featlearn_efreq,featlearn_esfreq]
Generate all benchmarking results (can easily take days or weeks, best run in parallel with a job scheduler)
> python run.py data=[datacol,datacol_lerp,datacol_efreq,datacol_esfreq] algo=[linear,annoy,hnsw,ivfpq,scann,kdtree]
> python run.py data=[featlearn,featlearn_lerp,featlearn_efreq,featlearn_esfreq] algo=[linear,annoy,hnsw,ivfpq,scann,kdtree]
A template file for new datasets is provided at ./dyann/data/template.py
Usage Instructions:
- Copy template.py and change the filename and class name for your new dataset
- Update ./dyann/data/proxy.py to include the names you have chosen
- Fill in each of the TODO items (refer to existing datasets for hints if needed)
- Create any number of configuration sets in ./conf/data/ with name property set to this filename and scale property providing an optional parameter sweep
A template file for new datasets is provided at ./dyann/algo/template.py
Usage Instructions:
- Copy template.py and change the filename and class name for your new ANN algorithm
- Update ./dyann/algo/proxy.py to include the names you have chosen
- Fill in each of the TODO items (refer to existing algorithms for hints if needed)
- Create both the build and search configuration files in ./conf/algo/ with name property set to this filename the lists of parameters for the build and query properties will be swept