Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few questions on Milvus when running LAION 100M large dataset #367

Open
agandra30 opened this issue Aug 31, 2024 · 5 comments
Open

Few questions on Milvus when running LAION 100M large dataset #367

agandra30 opened this issue Aug 31, 2024 · 5 comments

Comments

@agandra30
Copy link

I am using VectorDBbench to perform and analyse the milvus capabilities before it handle our load at scale. ?
We are using 1 server with 4 NVIDIA L40S gpus and i have assigned 2 for querynode and 2 for indexnode.

When i ran the no filter search performance test index LAION 100M dataset and index type is DISKANN with K=100 and the entire setup just hung for like hours together in optmize state and wondering there is no more logs to see whats happening in this state ?

Few questions :

  1. What does optimise state really does in this case. ?
  2. Enlighten me here, i am thinking GPUs are no role to play in the optimize state.
  3. How long could we expect to this test to complete any rough ideas. ?
  4. Does the query and index gpus only play role , when there is any indexing happening and query happenings and how to check the GPU usage ? (I tried the nvidimia-smi) and there is no usage i observed.

Last status

#####. Data set #####
2024-08-30 13:05:46,369 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'LAION', 'size': 100000000, 'dim': 768, 'metric_type': <MetricType.L2: 'L2'>}}, 'db': 'Milvus-r1u1'}, drop_old=True (interface.py:164) (2145320)


##### Current state #####
2024-08-31 05:00:37,072 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=48396.15328549099 (serial_runner.py:61) (2584643)
2024-08-31 05:00:38,764 | INFO: Milvus post insert before optimize (milvus.py:101) (982732)

my helm values.yaml file looks like this :

`
indexNode:
resources:
requests:
nvidia.com/gpu: "2"
limits:
nvidia.com/gpu: "2"
queryNode:
resources:
requests:
nvidia.com/gpu: "2"
limits:
nvidia.com/gpu: "2"
mmap:
# Set memory mapping property for whole cluster
mmapEnabled: true
# Set memory-mapped directory path, if you leave mmapDirPath unspecified, the memory-mapped files will be stored in {localStorage.path}/ mmap by default.
mmapDirPath: /mnt/vector/clustersetup_files/

minio:
enabled: false

externalS3:
enabled: true
host: "xx..xx.xxx.xx"
port: "xx"
accessKey: "mykey"
secretKey: "myskey"
useSSL: false
bucketName: "milvusdb"
rootPath: ""
useIAM: false
cloudProvider: "aws"
iamEndpoint: ""`

@xiaofan-luan
Copy link
Collaborator

gpu won't work in your case, unless you are using gpu index.
but for your case I guess gpu memory won't be enough.

We are working on a mode build with GPU and search with CPu but it's not there yet

@xiaofan-luan
Copy link
Collaborator

bulkinsert and index 100M data usually takes couple of hours. each search could take hundred milliseconds so It kindly depend on how much queries you want to run

@alwayslove2013
Copy link
Collaborator

@agandra30
For milvus, optimize refers primarily to compaction - manually allows milvus to consolidate various fragmented segments into larger ones, which improves query performance.

What does optimise state really does in this case. ?

This depends on the performance of your machine. Note that you have chosen DiskANN, one of the CPU index types that do not utilize any GPU resources; therefore, performance relies solely on CPU capabilities. Specifically, if you are using Milvus in standalone mode, it depends on the number of CPUs available. If you are using Milvus in cluster mode, it depends on the number of CPUs allocated to the index node.

How long could we expect to this test to complete any rough ideas. ?

@agandra30
Copy link
Author

agandra30 commented Sep 2, 2024

Thank you @xiaofan-luan and @alwayslove2013 for your replies . Appreciate your support . Is there any way i can use the 100M dataset for filter search for "filtering search 1% and 99%" test case . It is restricted to only 10M dataset , how can i use that option on the UI for the same LAION 100M dataset ?

I used the same LAION dataset as custom dataset, but it only let me do the no-filter search performance test , but not 1% and 99%. test which i am looking to perform . is there anyway you guide us. ?

image

In the filtering test case 1% and 99% is the search happens serially or concurrently ?
How to check my mmap configuration given above in my deployment state is being used instead of local disk space ?

gpu won't work in your case, unless you are using gpu index. but for your case I guess gpu memory won't be enough.

We are working on a mode build with GPU and search with CPu but it's not there yet

Thanks @xiaofan-luan for the reply , when you meant that the memory won't enough, you mean the 48GB (46068MiB)/GPU == 48x4 == 192 GB is not sufficient to process a 100Million Dataset ? . I am assuming that because we are only using gpus for processing nothing to store correct ?

Is GPU_CAGRA recommended instead of HSNW ?

@alwayslove2013
Copy link
Collaborator

@agandra30 Currently, VectorDBBench does not support filtering cases with Laion 100M.
The main reason is that the 100M dataset is quite large (~300G), and the cost of computing the ground truth is relatively high. Therefore, we have not prepared ground truth for the filtering cases.

Is there any way i can use the 100M dataset for filter search for "filtering search 1% and 99%" test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants