-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: serveral tpcc query timeout during stability test on distributed mode #18723
Comments
{"level":"INFO","time":"2024/09/11 18:52:08.325431 +0000","name":"cn-service","caller":"frontend/result_row_stmt.go:71","msg":"time of Exec.Run : 1m17.690626612s","service":"61373436-3363-3130-6637-356361633638","uuid":"61373436-3363-3130-6637-356361633638","session_info":"connectionId 648||account tpcc_test:admin|goRoutineId 0|migrate-goRoutineId 0|0191dc5d-5994-7c66-83ad-1422331870a5","role":"accountadmin","session_id":"0191dc5d-5994-7c66-83ad-1422331870a5","statement_id":"0191e26c-8d8a-7573-b2ed-b03e4725861d","txn_id":"c1b234c842e5f7f917f3e86cf235a71f","span":{"trace_id":"bbb7168c-b648-e355-deaa-7b8474be6897","span_id":"34eec943378f5ae4"}} 明天和 @ouyuanning 一起看一下 |
选了connection id 为648的连接看了日志,最后超时的事务id是c1b234c842e5f7f917f3e86cf235a71f,然后又根据这个事务id去搜索日志: 麻烦@iamlinjunhong 帮忙看一下吧 |
事务 c1b234c842e5f7f917f3e86cf235a71f 在等 事务b1738c2ec4f5d21917f3e86ce10bc216, 而事务 b1738c2ec4f5d21917f3e86ce10bc216 在 goroutine id 为 0 上 |
未投入 |
goroutine id 为0的问题 |
未投入 |
1 similar comment
未投入 |
repro:https://github.com/matrixorigin/matrixone/actions/runs/11290470537/job/31404655585 timeout(1m) txn_id: f57324d43a20441f17fd5f07d1dcec86 |
repro:https://github.com/matrixorigin/matrixone/actions/runs/11297350987/job/31425994909 timeout(1m)txn_id:51b551fd1d8b10ae17fd79a5008b5a24 exec_plan: 51b551fd1d8b10ae17fd79a5008b5a24_execplan.json log:https://grafana.ci.matrixorigin.cn/goto/lGqARmiNg?orgId=1 holder txn_id: 44960eac9774aab117fd79a7273ecdc1 sql history: |
51b551fd1d8b10ae17fd79a5008b5a24 超过一分钟和等锁没有关系,check orphan 和 lock 是并发执行,51b551fd1d8b10ae17fd79a5008b5a24 没有在 waiters 中,表示事务已经完成了。且 51b551fd1d8b10ae17fd79a5008b5a24 在 2024-10-12 03:18:44.941 已经拿到锁了,比 44960eac9774aab117fd79a7273d832c,51b551fd1d8b10ae17fd79a5008e9c3d 早。 |
repro:https://github.com/matrixorigin/matrixone/actions/runs/11390904060/job/31696189874 info about timeout connection:
log link: https://grafana.ci.matrixorigin.cn/goto/_HkndJiNR?orgId=1 |
事务等待队列过长 |
暂未处理 |
暂未处理 |
The merge run was executed with 10 warehouses and 100 concurrency, which increases the cross-warehouse load, leading to a higher probability of multiple transactions locking the same (warehouse, district). From the logs and trace, it appears that the timed-out transactions indeed experienced increased lock wait times due to this reason, and the lock queue on the same (warehouse, district) is very long. We will temporarily close this issue and change the merge run to 50 warehouses with 50 concurrency to observe. If the issue persists, a new issue will be opened. https://doc.weixin.qq.com/doc/w3_AEkAzgYFABUkPznumb3QLa5P00ohz?scode=AJsA6gc3AA80HXWdeIAEkAzgYFABU |
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
c4b2445
Other Environment Information
Actual Behavior
During stability test, there were several queries timeout in 60s.
2024-09-11 21:22:25 ERROR jTPCCConnection:327 - The connection[652] has not been valid.
2024-09-11 21:22:25 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[652] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,208 milliseconds ago. The last packet sent successfully to the server was 60,251 milliseconds ago.
2024-09-12 02:51:51 ERROR jTPCCConnection:327 - The connection[648] has not been valid.
2024-09-12 02:51:51 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[648] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,508 milliseconds ago. The last packet sent successfully to the server was 60,508 milliseconds ago.
2024-09-12 06:08:36 ERROR jTPCCConnection:327 - The connection[654] has not been valid.
2024-09-12 06:08:36 FATAL jTPCCTerminal:214 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[654] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,095 milliseconds ago. The last packet sent successfully to the server was 60,165 milliseconds ago.
2024-09-12 06:32:34 ERROR jTPCCConnection:327 - The connection[650] has not been valid.
2024-09-12 06:32:34 FATAL jTPCCTerminal:214 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[650] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,067 milliseconds ago. The last packet sent successfully to the server was 60,115 milliseconds ago.
2024-09-12 06:50:39 ERROR jTPCCConnection:327 - The connection[72304936] has not been valid.
2024-09-12 06:50:39 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[72304936] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,096 milliseconds ago. The last packet sent successfully to the server was 60,096 milliseconds ago.
2024-09-12 07:29:47 ERROR jTPCCConnection:327 - The connection[64795267] has not been valid.
2024-09-12 07:29:47 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[64795267] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,061 milliseconds ago. The last packet sent successfully to the server was 60,061 milliseconds ago.
2024-09-12 07:46:22 ERROR jTPCCConnection:327 - The connection[653] has not been valid.
2024-09-12 07:46:22 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[653] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,061 milliseconds ago. The last packet sent successfully to the server was 60,061 milliseconds ago.
2024-09-12 08:55:16 ERROR jTPCCConnection:327 - The connection[69671902] has not been valid.
2024-09-12 08:55:16 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[69671902] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,063 milliseconds ago. The last packet sent successfully to the server was 60,063 milliseconds ago.
2024-09-12 10:01:52 ERROR jTPCCConnection:327 - The connection[72035695] has not been valid.
2024-09-12 10:01:52 FATAL jTPCCTerminal:214 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[72035695] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,061 milliseconds ago. The last packet sent successfully to the server was 60,061 milliseconds ago.
2024-09-12 10:01:55 ERROR jTPCCConnection:327 - The connection[649] has not been valid.
2024-09-12 10:01:55 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[649] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,060 milliseconds ago. The last packet sent successfully to the server was 60,060 milliseconds ago.
2024-09-12 10:21:32 ERROR jTPCCConnection:327 - The connection[582] has not been valid.
2024-09-12 10:21:32 FATAL jTPCCTerminal:237 - [UNEXPECTED][STOCK_LEVEL][CONNECTION] The connection[582] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,060 milliseconds ago. The last packet sent successfully to the server was 60,060 milliseconds ago.
2024-09-12 10:21:44 ERROR jTPCCConnection:327 - The connection[651] has not been valid.
2024-09-12 10:21:44 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[651] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,061 milliseconds ago. The last packet sent successfully to the server was 60,952 milliseconds ago.
2024-09-12 10:25:48 ERROR jTPCCConnection:327 - The connection[74482530] has not been valid.
2024-09-12 10:25:48 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[74482530] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,039 milliseconds ago. The last packet sent successfully to the server was 60,039 milliseconds ago.
2024-09-12 10:27:22 ERROR jTPCCConnection:327 - The connection[74664736] has not been valid.
2024-09-12 10:27:22 FATAL jTPCCTerminal:237 - [UNEXPECTED][STOCK_LEVEL][CONNECTION] The connection[74664736] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,061 milliseconds ago. The last packet sent successfully to the server was 60,061 milliseconds ago.
2024-09-12 10:27:23 ERROR jTPCCConnection:327 - The connection[581] has not been valid.
2024-09-12 10:27:23 FATAL jTPCCTerminal:237 - [UNEXPECTED][STOCK_LEVEL][CONNECTION] The connection[581] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,059 milliseconds ago. The last packet sent successfully to the server was 60,059 milliseconds ago.
2024-09-12 10:27:23 ERROR jTPCCConnection:327 - The connection[72503991] has not been valid.
2024-09-12 10:27:23 FATAL jTPCCTerminal:328 - [UNEXPECTED][TT_NEW_ORDER][CONNECTION] The connection[72503991] has not been valid, caused by: Communications link failure
The last packet successfully received from the server was 60,062 milliseconds ago. The last packet sent successfully to the server was 60,126 milliseconds ago.
2024-09-12 10:27:32 ERROR ConsistencyCheck:152 - Communications link failure
There were no 'long running ' or 'leak' in mo log, and the queries were only executed very slowly for these times.
statement_info:
timeout.txt
mo-log;
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-c4b2445-202409102229%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221726060945000%22,%22to%22:%221726108052000%22%7D%7D%7D&schemaVersion=1&orgId=1
Expected Behavior
No response
Steps to Reproduce
Additional information
No response
The text was updated successfully, but these errors were encountered: