-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: bvt failed due to insert into t select * from t
timeout(reported not return result in 120000 ms)
#9447
Comments
@JackTan25 plz confirm this issue |
This case is not delete timeout, but insert 4 million rows of data timeout |
{"level":"ERROR","time":"2023/05/14 14:16:47.336467 +0000","caller":"engine/entire_engine.go:73","msg":"error: internal error: temporary engine not init yet","span":{"trace_id":"f61f15d8-f261-11ed-90e5-92c62a464ff4","kind":"statement"},"stacktrace":"github.com/matrixorigin/matrixone/pkg/vm/engine.(*EntireEngine).Database\n\t/go/src/github.com/matrixorigin/matrixone/pkg/vm/engine/entire_engine.go:73\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*TxnCompilerContext).getTmpRelation\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/compiler_context.go:219\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*TxnCompilerContext).getRelation\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/compiler_context.go:202\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*TxnCompilerContext).Resolve\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/compiler_context.go:263\ngithub.com/matrixorigin/matrixone/pkg/sql/plan.setTableExprToDmlTableInfo\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/plan/build_constraint_util.go:240\ngithub.com/matrixorigin/matrixone/pkg/sql/plan.getDmlTableInfo\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/plan/build_constraint_util.go:335\ngithub.com/matrixorigin/matrixone/pkg/sql/plan.buildDelete\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/plan/build_delete.go:27\ngithub.com/matrixorigin/matrixone/pkg/sql/plan.BuildPlan\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/plan/build.go:71\ngithub.com/matrixorigin/matrixone/pkg/sql/plan.(*BaseOptimizer).Optimize\n\t/go/src/github.com/matrixorigin/matrixone/pkg/sql/plan/optimize.go:56\ngithub.com/matrixorigin/matrixone/pkg/frontend.buildPlan\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:1514\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*TxnComputationWrapper).Compile\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/computation_wrapper.go:198\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*MysqlCmdExecutor).doComQuery\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:2809\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*MysqlCmdExecutor).ExecRequest\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/mysql_cmd_executor.go:3397\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*Routine).handleRequest\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/routine.go:194\ngithub.com/matrixorigin/matrixone/pkg/frontend.(*RoutineManager).Handler\n\t/go/src/github.com/matrixorigin/matrixone/pkg/frontend/routine_manager.go:432\ngithub.com/fagongzi/goetty/v2.(*server).doConnection\n\t/go/pkg/mod/github.com/matrixorigin/goetty/[email protected]/application.go:381\ngithub.com/fagongzi/goetty/v2.(*server).doStart.func2.1\n\t/go/pkg/mod/github.com/matrixorigin/goetty/[email protected]/application.go:344"} |
After executing the insert into statement, CN cannot get any table relationship after two minutes |
in progress |
we need to wait pessimistic transaction to solve this. The hidden pk's conflict is slow. So sometimes it will result this over time occationally. |
I think we change the s3 size threshold to lift the system performace |
local insert is fast, eks environment is strange. |
Exec.Run spend about 3 minutes before txn committing. |
insert into t select * from t
timeout(reported not return result in 120000 ms)
我把对应的日志都截取到这里 |
result.zip |
@zhangxu19830126 旭哥先确定一下是dn没收到commit请求还是dn commit慢。 |
测试发现DN写log service超过rpc大小。先尝试修复这个后,在看。 |
等 @volgariver6 修复rpc的问题。 |
等 @volgariver6 修复 rpc 的问题在看 |
还没有找到问题 |
1 similar comment
还没有找到问题 |
have fixed rpc message max size limit locally, but still have other bug. Need more investigation. |
对dn发送 message做了拆分之后,运行一些测试还有问题,因为logtail没有同步到cn导致事务执行失败: 2023/06/01 12:14:05.791414 +0800 ERROR disttae/logtailPushModel.go:138 error: transaction error: new txn failed. please retry {"span": {"trace_id": "97569c62-0032-11ee-aaec-f644820d18b1", "kind": "statement"}} @cnutshell 正在帮忙分析 |
修复了一些问题之后,还存在问题: 2023/06/02 21:24:56.336178 +0800 ERROR dn-service morpc/server.go:361 write response failed {"uuid": "dd4dccb4-4d3c-41f8-b482-5251dc7a41bf", "name": "txn-server", "request-id": 68, "error": "write tcp4 127.0.0.1:22000->127.0.0.1:65435: use of closed network connection"} 这个问题还没找到原因 |
delete 失败的问题事务commit的时候,dn 的 txn server 发送 response 失败的原因找到了。原因是commit事务的时间很长,导致connection长时间没有数据发送,rpc框架中将其认定为idle connection,并close。 与 @XuPeng-SH @triump2020 沟通后,会对delete做一些优化看看效果。 该问题是 #9647 。 insert 超时dn在执行insert时,并没有拉取s3数据并检查冲突的过程,因此理论上insert时间不会太长,该issue在本地也没有复现出来。 |
insert 超时问题没有太大进展。 |
no progress. |
还是没有办法重现 |
在eks上跑了一下午,没有出现insert超时或者session挂掉的情况 |
eks regression do not appear this issue , turn it off first |
Is there an existing issue for the same bug?
Environment
Actual Behavior
job:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/4972764075/jobs/8898201825
2023-05-14 14:16:47 ERROR Executor:175 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:205][insert into t select * from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:177 - [EXPECT RESULT]:
null
2023-05-14 14:16:47 ERROR Executor:178 - [ACTUAL RESULT]:
MO does not return result in 120000 ms.
2023-05-14 14:16:47 WARN Executor:181 - The mo-tester tries to re-connect to mo, con[id=0, user=dump, *** db=delete], please wait.....
2023-05-14 14:16:47 INFO ConnectionManager:67 - New conneciton from mo with[user=dump, *** has been initialized.
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:206][delete from t where a > 1;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
null
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
no such table delete.t
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:207][select count() from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
count()
512
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
SQL parser error: table "t" does not exist
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:209][select count() from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
count()
2097152
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
SQL parser error: table "t" does not exist
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:211][insert into t select * from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
null
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
no such table delete.t
Previous DML conflicts with existing constraints or data format. This transaction has to be aborted
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:212][delete from t where a > 1;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
null
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
no such table delete.t
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:213][select count() from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
count()
512
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
SQL parser error: table "t" does not exist
2023-05-14 14:16:47 ERROR Executor:212 - [/runner/_work/mo-nightly-regression/mo-nightly-regression/matrixone/test/distributed/cases/dml/delete/delete.test][row:215][select count() from t;] was executed failed, con[id=0, user=dump, ***
2023-05-14 14:16:47 ERROR Executor:214 - [EXPECT RESULT]:
count()
512
2023-05-14 14:16:47 ERROR Executor:215 - [ACTUAL RESULT]:
SQL parser error: table "t" does not exist
mo log path: s3://heni-test/mo-data-push/etl_20230514143207
Expected Behavior
No response
Steps to Reproduce
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: