Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT #840

gavinlichn · 2024-10-31T10:26:08Z

Aware that dataprep/redis/langchain vector DB index algorithm is FLAT. But remembered we use HNSW before.

Investigating the code, it caused by the removing of index_schema, changed with PR #347
If index_schema removed, redis fall back to default index algorith(FLAT)

Considering that may impact the performance.

Can you help to clarify the background of this change please? @Spycsh

Spycsh · 2024-10-31T12:35:33Z

Hi @gavinlichn , The PR is to remove the hard length limitation and make the vecdb initialization more simple. I do not think explicitly initializing redis with the schema 768 or 1024 is a concise way. With that PR, if users use BGE base, the redis can automatically accept embedding with length 768, otherwise if users use BGE large, the redis can automatically accept embedding with length 1024. Users do not need to know/change the schema length if they use another embedding model.

However as you said, the schema also contains a non-default index algorithm. Could you please give us some data or reason about how is HNSW faster/better than the default ones? Or can we pass a parameter to Redis to change the default indexing algorithm, which I think is more simple?

gavinlichn · 2024-11-01T01:54:19Z

As the intent of PR is to remove the length limitation. prefer to keep the change clean, and not touch more logic.
How about remove the dimension only, and keep other parameters in the previous schema?

For the algorithm comparison, we can refer to Redis' document:

Choose the HNSW index type when you have larger datasets (> 1M documents) or when search performance and scalability are more important than perfect search accuracy
Choose the FLAT index when you have small datasets (< 1M vectors) or when perfect search accuracy is more important than search latency.

https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/vectors/#hnsw-index

Spycsh · 2024-11-01T02:55:37Z

I agree that let users pass a customized schema is reasonable. I think keep the default behaviors (FLAT, accuracy first) now and allow advanced users to set a schema (maybe set the None as default i.e. REDIS_SCHEMA = os.getenv("REDIS_SCHEMA", None) and users can set REDIS_SCHEMA to the path to their own redis yml). What do you think? Do you want to make a PR for this?

feng-intel · 2024-11-06T04:50:13Z

@gavinlichn , for the above Spycsh's solution.

gavinlichn · 2024-11-06T08:37:20Z

@gavinlichn , for the above Spycsh's solution.

To include schema by environment variable is reasonable, original designed is similar.
I agree with this solution

gavinlichn changed the title ~~Discussion of the redis vector DB index algorithm, changed from HNSW to FLAT~~ Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT Oct 31, 2024

yinghu5 added the DEV features label Nov 6, 2024

feng-intel assigned feng-intel and Spycsh and unassigned feng-intel Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT #840

Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT #840

gavinlichn commented Oct 31, 2024

Spycsh commented Oct 31, 2024

gavinlichn commented Nov 1, 2024

Spycsh commented Nov 1, 2024

feng-intel commented Nov 6, 2024

gavinlichn commented Nov 6, 2024

Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT #840

Discussion about the redis vector DB index algorithm, changed from HNSW to FLAT #840

Comments

gavinlichn commented Oct 31, 2024

Spycsh commented Oct 31, 2024

gavinlichn commented Nov 1, 2024

Spycsh commented Nov 1, 2024

feng-intel commented Nov 6, 2024

gavinlichn commented Nov 6, 2024