Implement HashChain search #29

nishihatapalmer · 2022-06-15T08:29:47Z

HashChain is pretty much the fastest search for longer patterns. Implement it for byteseek.

It works well for low alphabets and high alphabets, and is tunable for different memory requirements. It's a factor search, most similar to WFR in that it uses a rolling hash construction. It creates chains of hashes in the hash table with a fingerprint that allows verification that the next hash is correct before doing another index lookup.

Need to figure out where its worst cases get triggered. Search time rises dramatically (like QF, and WFR). In practice it seems quite hard to trigger if the hash table is sufficiently empty or we aren't dealing with pathological low alphabet or repetitive data. It still does better than WFR or QF in these situations though.

Nevertheless, byteseek needs to know if it is appropriate to select this algorithm or whether another slower but without the same extreme worst case behaviour needs to be used instead. Or if the pattern can be modified (use a substring) that would avoid the pathological case? A pattern composed only of zeros might not do well with hash chain.

nishihatapalmer · 2022-06-15T08:31:48Z

NB. HashChain is not yet finished or published. I'm still working on fully understanding and optimising the hash table construction, and defining what tuning parameters are required (automate as far as possible, use should only select memory really, although knowledge of low/high alphabet or types of data to search (shannon entropy?) could influence selection of qgram length and bitshift parameters.

nishihatapalmer added the enhancement label Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HashChain search #29

Implement HashChain search #29

nishihatapalmer commented Jun 15, 2022 •

edited

Loading

nishihatapalmer commented Jun 15, 2022

Implement HashChain search #29

Implement HashChain search #29

Comments

nishihatapalmer commented Jun 15, 2022 • edited Loading

nishihatapalmer commented Jun 15, 2022

nishihatapalmer commented Jun 15, 2022 •

edited

Loading