Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout issue with Milvus #333

Open
mattbloke99 opened this issue Jun 5, 2024 · 7 comments
Open

Timeout issue with Milvus #333

mattbloke99 opened this issue Jun 5, 2024 · 7 comments
Assignees

Comments

@mattbloke99
Copy link

Getting consistent timeout issues when testing Milvus with standard Cohere 1M vectors, 768 dimensions test
On c5a.8xlarge EC2 instance 32vCPU and 64Gb RAM and 128Gb SSD

2024-06-05 10:59:09,978 | WARNING: VectorDB optimize timeout in 900 (task_runner.py:204) (197576)
2024-06-05 10:59:10,041 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:146) (197576)
Traceback (most recent call last):
File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 202, in _optimize
return future.result(timeout=self.ca.optimize_timeout)[1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 458, in result
raise TimeoutError()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 134, in _run_perf_case
build_dur = self._optimize()
^^^^^^^^^^^^^^^^
File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 207, in _optimize
raise PerformanceTimeoutError("Performance case optimize timeout") from e
vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout

@alwayslove2013
Copy link
Collaborator

alwayslove2013 commented Jun 7, 2024

@mattbloke99 PR: #332 increased the optimized time limit, you can update the vdbbench to v0.0.10.

@alwayslove2013 alwayslove2013 self-assigned this Jun 7, 2024
@anrahman4
Copy link

anrahman4 commented Jul 2, 2024

I am also getting this timeout issue with v0.0.11 trying to run Search Performance Test (1M Dataset, 768 Dims), and 10M and 100M as well. Trying to run a test using DiskANN with search_list = 100.

Bare Metal Hardware:
Server: HPE ProLiant DL360 Gen11
BIOS: 2.20
CPU: Dual Socket Intel(R) Xeon(R) Gold 6426Y
DRAM: 128 GB
Milvus Version: 2.4.4 using Docker Image Latest
VectorDB Bench Version: 0.0.11 via pip install
Python Version: 3.11

2024-07-02 14:31:06,433 | INFO: generated uuid for the tasks: 82c1b66f8ec24aad90dffc606128e746 (interface.py:66) (349602)
2024-07-02 14:31:06,565 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070214 (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO: task submitted: id=82c1b66f8ec24aad90dffc606128e746, 2024070214, case number: 1 (interface.py:231) (349602)
2024-07-02 14:31:07,086 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:164) (349724)
2024-07-02 14:31:07,086 | INFO: Starting run (task_runner.py:100) (349724)
2024-07-02 14:31:07,336 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (349724)
2024-07-02 14:31:07,341 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (349724)
2024-07-02 14:31:08,347 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,379 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,412 | INFO: Start performance case (task_runner.py:158) (349724)
2024-07-02 14:31:09,199 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (349889)
2024-07-02 14:31:09,199 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (349889)
2024-07-02 14:31:22,801 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:31:45,193 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:09,818 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:33,988 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:58,992 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:21,667 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:45,791 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:07,282 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:31,609 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,207 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,211 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=226.0110656730103 (serial_runner.py:61) (349889)
2024-07-02 14:34:58,411 | INFO: Milvus post insert before optimize (milvus.py:101) (352315)
2024-07-02 15:04:56,111 | WARNING: VectorDB optimize timeout in 1800 (task_runner.py:249) (349724)
2024-07-02 15:04:56,185 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (349724)
Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 458, in result
    raise TimeoutError()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 165, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    raise PerformanceTimeoutError("Performance case optimize timeout") from e
vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout
2024-07-02 15:04:56,190 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'} failed to run, reason=Performance case optimize timeout (interface.py:179) (349724)
2024-07-02 15:04:56,192 | INFO |Task summary: run_id=82c1b, task_label=2024070214 (models.py:329)
2024-07-02 15:04:56,192 | INFO |DB     | db_label case              label      | load_dur    qps        latency(p99)    recall        max_load_count | label (models.py:329)
2024-07-02 15:04:56,192 | INFO |------ | -------- ----------------- ---------- | ----------- ---------- --------------- ------------- -------------- | ----- (models.py:329)
2024-07-02 15:04:56,192 | INFO |Milvus |          Performance768D1M 2024070214 | 0.0         0.0        0.0             0.0           0              | ?     (models.py:329)
2024-07-02 15:04:56,193 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240702_2024070214_milvus.json (models.py:187) (349724)
2024-07-02 15:04:56,197 | INFO: Success to finish task: label=2024070214, run_id=82c1b66f8ec24aad90dffc606128e746 (interface.py:203) (349724)

@anrahman4
Copy link

I am also getting this timeout issue with v0.0.11 trying to run Search Performance Test (1M Dataset, 768 Dims), and 10M and 100M as well. Trying to run a test using DiskANN with search_list = 100.

Bare Metal Hardware: Server: HPE ProLiant DL360 Gen11 BIOS: 2.20 CPU: Dual Socket Intel(R) Xeon(R) Gold 6426Y DRAM: 128 GB Milvus Version: 2.4.4 using Docker Image Latest VectorDB Bench Version: 0.0.11 via pip install Python Version: 3.11

2024-07-02 14:31:06,433 | INFO: generated uuid for the tasks: 82c1b66f8ec24aad90dffc606128e746 (interface.py:66) (349602)
2024-07-02 14:31:06,565 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070214 (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO: task submitted: id=82c1b66f8ec24aad90dffc606128e746, 2024070214, case number: 1 (interface.py:231) (349602)
2024-07-02 14:31:07,086 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:164) (349724)
2024-07-02 14:31:07,086 | INFO: Starting run (task_runner.py:100) (349724)
2024-07-02 14:31:07,336 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (349724)
2024-07-02 14:31:07,341 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (349724)
2024-07-02 14:31:08,347 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,379 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,412 | INFO: Start performance case (task_runner.py:158) (349724)
2024-07-02 14:31:09,199 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (349889)
2024-07-02 14:31:09,199 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (349889)
2024-07-02 14:31:22,801 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:31:45,193 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:09,818 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:33,988 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:58,992 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:21,667 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:45,791 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:07,282 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:31,609 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,207 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,211 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=226.0110656730103 (serial_runner.py:61) (349889)
2024-07-02 14:34:58,411 | INFO: Milvus post insert before optimize (milvus.py:101) (352315)
2024-07-02 15:04:56,111 | WARNING: VectorDB optimize timeout in 1800 (task_runner.py:249) (349724)
2024-07-02 15:04:56,185 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (349724)
Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 458, in result
    raise TimeoutError()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 165, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    raise PerformanceTimeoutError("Performance case optimize timeout") from e
vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout
2024-07-02 15:04:56,190 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'} failed to run, reason=Performance case optimize timeout (interface.py:179) (349724)
2024-07-02 15:04:56,192 | INFO |Task summary: run_id=82c1b, task_label=2024070214 (models.py:329)
2024-07-02 15:04:56,192 | INFO |DB     | db_label case              label      | load_dur    qps        latency(p99)    recall        max_load_count | label (models.py:329)
2024-07-02 15:04:56,192 | INFO |------ | -------- ----------------- ---------- | ----------- ---------- --------------- ------------- -------------- | ----- (models.py:329)
2024-07-02 15:04:56,192 | INFO |Milvus |          Performance768D1M 2024070214 | 0.0         0.0        0.0             0.0           0              | ?     (models.py:329)
2024-07-02 15:04:56,193 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240702_2024070214_milvus.json (models.py:187) (349724)
2024-07-02 15:04:56,197 | INFO: Success to finish task: label=2024070214, run_id=82c1b66f8ec24aad90dffc606128e746 (interface.py:203) (349724)

I also have noticed that I do not exhibit this behavior when switching back to v0.0.7 of vectordb-bench on the exact same hardware / OS:

2024-07-03 09:32:25,935 | INFO: generated uuid for the tasks: 16b30ded9e904e4d97e77d59a56c89a3 (interface.py:69) (725651)
2024-07-03 09:32:26,116 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070309 (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO: task submitted: id=16b30ded9e904e4d97e77d59a56c89a3, 2024070309, case number: 1 (interface.py:235) (725651)
2024-07-03 09:32:26,640 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:167) (725751)
2024-07-03 09:32:26,896 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (725751)
2024-07-03 09:32:26,901 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (725751)
2024-07-03 09:32:28,136 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (725751)
2024-07-03 09:32:28,186 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (725751)
2024-07-03 09:32:29,015 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (725918)
2024-07-03 09:32:29,016 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (725918)
2024-07-03 09:32:43,467 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:08,294 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:30,597 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:58,060 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:34:23,968 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:34:49,068 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:12,599 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:36,397 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:58,726 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:36:21,494 | INFO: Milvus post insert before optimize (milvus.py:100) (725918)

2024-07-03 10:12:29,141 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 10:12:29,142 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=2400.126074722997 (serial_runner.py:63) (725918)
2024-07-03 10:12:30,164 | INFO: Milvus optimizing before search (milvus.py:92) (732543)
2024-07-03 10:12:34,820 | INFO: Finish loading the entire dataset into VectorDB, insert_duration=2401.1505225989968, optimize_duration=4.540166485996451 load_duration(insert + optimize) = 2405.6907 (task_runner.py:136) (725751)
2024-07-03 10:12:35,638 | INFO: SpawnProcess-1:3 start search the entire test_data to get recall and latency (serial_runner.py:173) (732670)
2024-07-03 10:12:46,387 | INFO: SpawnProcess-1:3 search entire test_data: cost=4.3488s, queries=1000, avg_recall=0.977, avg_latency=0.0043, p99=0.0054 (serial_runner.py:208) (732670)
2024-07-03 10:12:47,106 | INFO: Start search 300s in concurrency 60, filters: None (mp_runner.py:93) (725751)
2024-07-03 10:12:57,267 | INFO: Syncing all process and start concurrency search, concurrency=60 (mp_runner.py:102) (725751)
2024-07-03 10:17:57,305 | INFO: SpawnProcess-1:36 search 300s: actual_dur=300.0066s, count=28500, qps in this process: 94.9979 (mp_runner.py:73) (732938)
2024-07-03 10:17:57,308 | INFO: SpawnProcess-1:57 search 300s: actual_dur=300.0074s, count=28074, qps in this process: 93.5777 (mp_runner.py:73) (732959)
2024-07-03 10:17:57,309 | INFO: SpawnProcess-1:10 search 300s: actual_dur=300.0155s, count=28016, qps in this process: 93.3818 (mp_runner.py:73) (732912)
2024-07-03 10:17:57,310 | INFO: SpawnProcess-1:38 search 300s: actual_dur=300.008s, count=28444, qps in this process: 94.8108 (mp_runner.py:73) (732940)
2024-07-03 10:17:57,314 | INFO: SpawnProcess-1:62 search 300s: actual_dur=300.0124s, count=28597, qps in this process: 95.3194 (mp_runner.py:73) (732964)
2024-07-03 10:17:57,314 | INFO: SpawnProcess-1:15 search 300s: actual_dur=300.0097s, count=28575, qps in this process: 95.2469 (mp_runner.py:73) (732917)
2024-07-03 10:17:57,316 | INFO: SpawnProcess-1:56 search 300s: actual_dur=300.0132s, count=27957, qps in this process: 93.1859 (mp_runner.py:73) (732958)
2024-07-03 10:17:57,317 | INFO: SpawnProcess-1:45 search 300s: actual_dur=300.0049s, count=28597, qps in this process: 95.3218 (mp_runner.py:73) (732947)
2024-07-03 10:17:57,319 | INFO: SpawnProcess-1:11 search 300s: actual_dur=300.01s, count=28065, qps in this process: 93.5469 (mp_runner.py:73) (732913)
2024-07-03 10:17:57,321 | INFO: SpawnProcess-1:9 search 300s: actual_dur=300.0064s, count=28598, qps in this process: 95.3246 (mp_runner.py:73) (732911)
2024-07-03 10:17:57,324 | INFO: SpawnProcess-1:34 search 300s: actual_dur=300.0053s, count=28407, qps in this process: 94.6883 (mp_runner.py:73) (732936)
2024-07-03 10:17:57,325 | INFO: SpawnProcess-1:31 search 300s: actual_dur=300.0053s, count=28573, qps in this process: 95.2417 (mp_runner.py:73) (732933)
2024-07-03 10:17:57,324 | INFO: SpawnProcess-1:63 search 300s: actual_dur=300.007s, count=28012, qps in this process: 93.3712 (mp_runner.py:73) (732965)
2024-07-03 10:17:57,326 | INFO: SpawnProcess-1:20 search 300s: actual_dur=300.0051s, count=28461, qps in this process: 94.8684 (mp_runner.py:73) (732922)
2024-07-03 10:17:57,327 | INFO: SpawnProcess-1:52 search 300s: actual_dur=300.0047s, count=28378, qps in this process: 94.5919 (mp_runner.py:73) (732954)
2024-07-03 10:17:57,328 | INFO: SpawnProcess-1:5 search 300s: actual_dur=300.0069s, count=28629, qps in this process: 95.4278 (mp_runner.py:73) (732905)
2024-07-03 10:17:57,328 | INFO: SpawnProcess-1:19 search 300s: actual_dur=300.01s, count=27979, qps in this process: 93.2602 (mp_runner.py:73) (732921)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:64 search 300s: actual_dur=300.0107s, count=28614, qps in this process: 95.3766 (mp_runner.py:73) (732966)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:44 search 300s: actual_dur=300.0088s, count=28569, qps in this process: 95.2272 (mp_runner.py:73) (732946)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:42 search 300s: actual_dur=300.0106s, count=28414, qps in this process: 94.71 (mp_runner.py:73) (732944)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:49 search 300s: actual_dur=300.007s, count=28649, qps in this process: 95.4944 (mp_runner.py:73) (732951)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:6 search 300s: actual_dur=300.0131s, count=28053, qps in this process: 93.5059 (mp_runner.py:73) (732908)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:16 search 300s: actual_dur=300.0117s, count=28277, qps in this process: 94.253 (mp_runner.py:73) (732918)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:33 search 300s: actual_dur=300.0108s, count=28267, qps in this process: 94.2199 (mp_runner.py:73) (732935)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:48 search 300s: actual_dur=300.0069s, count=28427, qps in this process: 94.7545 (mp_runner.py:73) (732950)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:46 search 300s: actual_dur=300.0079s, count=27985, qps in this process: 93.2809 (mp_runner.py:73) (732948)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:30 search 300s: actual_dur=300.009s, count=28427, qps in this process: 94.7538 (mp_runner.py:73) (732932)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:29 search 300s: actual_dur=300.005s, count=28379, qps in this process: 94.5951 (mp_runner.py:73) (732931)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:26 search 300s: actual_dur=300.013s, count=27931, qps in this process: 93.0993 (mp_runner.py:73) (732928)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:22 search 300s: actual_dur=300.0065s, count=28648, qps in this process: 95.4913 (mp_runner.py:73) (732924)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:25 search 300s: actual_dur=300.0046s, count=28342, qps in this process: 94.4719 (mp_runner.py:73) (732927)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:18 search 300s: actual_dur=300.0112s, count=28221, qps in this process: 94.0665 (mp_runner.py:73) (732920)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:53 search 300s: actual_dur=300.0027s, count=28641, qps in this process: 95.4691 (mp_runner.py:73) (732955)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:43 search 300s: actual_dur=300.0052s, count=28211, qps in this process: 94.035 (mp_runner.py:73) (732945)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:7 search 300s: actual_dur=300.0079s, count=28281, qps in this process: 94.2675 (mp_runner.py:73) (732909)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:47 search 300s: actual_dur=300.0053s, count=28149, qps in this process: 93.8283 (mp_runner.py:73) (732949)
2024-07-03 10:17:57,335 | INFO: SpawnProcess-1:23 search 300s: actual_dur=300.0107s, count=28526, qps in this process: 95.0833 (mp_runner.py:73) (732925)
2024-07-03 10:17:57,335 | INFO: SpawnProcess-1:39 search 300s: actual_dur=300.0064s, count=28483, qps in this process: 94.9413 (mp_runner.py:73) (732941)
2024-07-03 10:17:57,336 | INFO: SpawnProcess-1:60 search 300s: actual_dur=300.0048s, count=28462, qps in this process: 94.8718 (mp_runner.py:73) (732962)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:28 search 300s: actual_dur=300.0071s, count=28591, qps in this process: 95.3011 (mp_runner.py:73) (732930)
2024-07-03 10:17:57,336 | INFO: SpawnProcess-1:14 search 300s: actual_dur=300.0039s, count=27943, qps in this process: 93.1421 (mp_runner.py:73) (732916)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:51 search 300s: actual_dur=300.0053s, count=28490, qps in this process: 94.965 (mp_runner.py:73) (732953)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:61 search 300s: actual_dur=300.0059s, count=28229, qps in this process: 94.0948 (mp_runner.py:73) (732963)
2024-07-03 10:17:57,338 | INFO: SpawnProcess-1:8 search 300s: actual_dur=300.0024s, count=28498, qps in this process: 94.9926 (mp_runner.py:73) (732910)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:50 search 300s: actual_dur=300.0103s, count=28284, qps in this process: 94.2768 (mp_runner.py:73) (732952)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:13 search 300s: actual_dur=300.0058s, count=28375, qps in this process: 94.5815 (mp_runner.py:73) (732915)
2024-07-03 10:17:57,338 | INFO: SpawnProcess-1:27 search 300s: actual_dur=300.0016s, count=28323, qps in this process: 94.4095 (mp_runner.py:73) (732929)
2024-07-03 10:17:57,339 | INFO: SpawnProcess-1:59 search 300s: actual_dur=300.0082s, count=28012, qps in this process: 93.3708 (mp_runner.py:73) (732961)
2024-07-03 10:17:57,339 | INFO: SpawnProcess-1:40 search 300s: actual_dur=300.006s, count=28612, qps in this process: 95.3714 (mp_runner.py:73) (732942)
2024-07-03 10:17:57,340 | INFO: SpawnProcess-1:21 search 300s: actual_dur=300.0056s, count=28086, qps in this process: 93.6183 (mp_runner.py:73) (732923)
2024-07-03 10:17:57,340 | INFO: SpawnProcess-1:35 search 300s: actual_dur=300.0054s, count=28016, qps in this process: 93.385 (mp_runner.py:73) (732937)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:17 search 300s: actual_dur=300.0014s, count=28462, qps in this process: 94.8729 (mp_runner.py:73) (732919)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:55 search 300s: actual_dur=300.0039s, count=28632, qps in this process: 95.4388 (mp_runner.py:73) (732957)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:24 search 300s: actual_dur=300.003s, count=27957, qps in this process: 93.1891 (mp_runner.py:73) (732926)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:12 search 300s: actual_dur=300.0051s, count=28514, qps in this process: 95.0451 (mp_runner.py:73) (732914)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:37 search 300s: actual_dur=300.0012s, count=27994, qps in this process: 93.313 (mp_runner.py:73) (732939)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:41 search 300s: actual_dur=300.0026s, count=28504, qps in this process: 95.0125 (mp_runner.py:73) (732943)
2024-07-03 10:17:57,343 | INFO: SpawnProcess-1:58 search 300s: actual_dur=300.0057s, count=28552, qps in this process: 95.1715 (mp_runner.py:73) (732960)
2024-07-03 10:17:57,344 | INFO: SpawnProcess-1:32 search 300s: actual_dur=300.0031s, count=28428, qps in this process: 94.759 (mp_runner.py:73) (732934)
2024-07-03 10:17:57,346 | INFO: SpawnProcess-1:54 search 300s: actual_dur=300.0052s, count=28526, qps in this process: 95.085 (mp_runner.py:73) (732956)
2024-07-03 10:17:57,348 | INFO: End search in concurrency 60: dur=300.07863932699547s, total_count=1700846, qps=5668.0009 (mp_runner.py:109) (725751)
2024-07-03 10:17:58,041 | INFO: Update largest qps with concurrency 60: current max_qps=5668.0009 (mp_runner.py:113) (725751)
2024-07-03 10:17:58,042 | INFO: Performance case got result: Metric(max_load_count=0, load_duration=2405.6907, qps=5668.0009, serial_latency_p99=0.0054, recall=0.977) (task_runner.py:150) (725751)
2024-07-03 10:17:58,042 | INFO: [1/1] finish case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, result=Metric(max_load_count=0, load_duration=2405.6907, qps=5668.0009, serial_latency_p99=0.0054, recall=0.977), label=ResultLabel.NORMAL (interface.py:169) (725751)
2024-07-03 10:17:58,042 | INFO |Task summary: run_id=16b30, task_label=2024070309 (models.py:286)
2024-07-03 10:17:58,042 | INFO |DB     | db_label case              label      | load_dur     qps          latency(p99)    recall        max_load_count | label (models.py:286)
2024-07-03 10:17:58,042 | INFO |------ | -------- ----------------- ---------- | ------------ ------------ --------------- ------------- -------------- | ----- (models.py:286)
2024-07-03 10:17:58,042 | INFO |Milvus |          Performance768D1M 2024070309 | 2405.6907    5668.0009    0.0054          0.977         0              | :)    (models.py:286)
2024-07-03 10:17:58,042 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240703_2024070309_milvus.json (models.py:144) (725751)
2024-07-03 10:17:58,042 | INFO: Succes to finish task: label=2024070309, run_id=16b30ded9e904e4d97e77d59a56c89a3 (interface.py:207) (725751)

@XuanYang-cn
Copy link
Collaborator

XuanYang-cn commented Jul 4, 2024

@mattbloke99 @anrahman4 Your machine needs 2400.126074722997 seconds to load + index the dataset in 0.0.7.

We changed the process in the latest vdbbench, the index part is moved into optimize.
Since it only costs you dur=226.0110656730103s to load data into milvus, you'll probablly need 3000s timeout config for optimize.

Just change this OPTIMIZE_TIMEOUT_768D_1M = 30 * 60 # 30min into 3000 in vectordb_bench/__init__.py and you'll be fine.

We'll check if the older timeout config still works for the latest milvus cases and ajust it accordingly. AFAIK, in our 16c64G tests it costs around 1500s.

@anrahman4
Copy link

Can confirm this worked with v0.0.11, thank you very much!

@1191510903
Copy link

image

Can not confirm this worked with v0.0.12, why create this question,the timeout was set 8 hours

@XuanYang-cn
Copy link
Collaborator

@1191510903 8hours is a really long time. Could you please upload logs of Milvus so that I can know why this is taking so much time?

Also I'd like to ask some quick questions:

  1. What's your Milvus version?
  2. Distributed or standalone?
  3. What's the cpu and memory resources for Milvus? Especially IndexNode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants