Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: TPCC connection was hung on Nightly Regression Test #7951

Closed
1 task done
aressu1985 opened this issue Feb 10, 2023 · 51 comments
Closed
1 task done

[Bug]: TPCC connection was hung on Nightly Regression Test #7951

aressu1985 opened this issue Feb 10, 2023 · 51 comments
Assignees
Labels
kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@aressu1985
Copy link
Contributor

aressu1985 commented Feb 10, 2023

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93): 15baa97
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

TPCC 1 warehouse 10 terminals connection was hung on Nightly Regression Test with commit : 15baa97

The error log is as following:

023/02/10 04:21:25.451532 +0800 ERROR log-service util/log.go:133 txn send requests failed {"uuid": "7c4dccb4-4d3c-41f8-b482-5251dc7a41bf", "requests": "[0: <4d98c3e0f5334d4a8a1ba121a8816790/Active/S:1675971685420501188-0>/Read/F-0/=><2-262146-0-127.0.0.1:22000>]", "error": "context deadline exceeded"}
github.com/matrixorigin/matrixone/pkg/txn/util.LogTxnSendRequestsFailed
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/txn/util/log.go:133
github.com/matrixorigin/matrixone/pkg/txn/client.(*txnOperator).doSend
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/txn/client/operator.go:491
github.com/matrixorigin/matrixone/pkg/txn/client.(*txnOperator).Read
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/txn/client/operator.go:252
github.com/matrixorigin/matrixone/pkg/vm/engine/disttae.getLogTail
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/disttae/logtail.go:58
github.com/matrixorigin/matrixone/pkg/vm/engine/disttae.updatePartition
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/disttae/logtail.go:42
github.com/matrixorigin/matrixone/pkg/vm/engine/disttae.(*DB).Update
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/disttae/db.go:322
github.com/matrixorigin/matrixone/pkg/vm/engine/disttae.(*Engine).New
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/disttae/engine.go:381
github.com/matrixorigin/matrixone/pkg/vm/engine.(*EntireEngine).New
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/vm/engine/entire_engine.go:33
github.com/matrixorigin/matrixone/pkg/frontend.(*TxnHandler).NewTxn
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/session.go:1542
github.com/matrixorigin/matrixone/pkg/frontend.(*Session).TxnBegin
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/session.go:1209
github.com/matrixorigin/matrixone/pkg/frontend.(*MysqlCmdExecutor).doComQuery
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:3524
github.com/matrixorigin/matrixone/pkg/frontend.(*BackgroundHandler).Exec
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/session.go:2453
github.com/matrixorigin/matrixone/pkg/frontend.determineUserHasPrivilegeSet
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/authenticate.go:4224
github.com/matrixorigin/matrixone/pkg/frontend.authenticateUserCanExecuteStatementWithObjectTypeDatabaseAndTable
/data1/action-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/authenticate.go:4659

the full log is very big ,if want, please contact me.

#8327 8327

Expected Behavior

No response

Steps to Reproduce

No response

Additional information

No response

@aressu1985 aressu1985 added kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Feb 10, 2023
@aressu1985 aressu1985 added this to the v0.7.0 milestone Feb 10, 2023
@florashi181 florashi181 modified the milestones: v0.7.0, V0.8.0 Feb 14, 2023
@daviszhen
Copy link
Contributor

I have not reproduced it .

@daviszhen
Copy link
Contributor

i am not focus on it today

4 similar comments
@daviszhen
Copy link
Contributor

i am not focus on it today

@daviszhen
Copy link
Contributor

i am not focus on it today

@daviszhen
Copy link
Contributor

i am not focus on it today

@daviszhen
Copy link
Contributor

i am not focus on it today

@daviszhen
Copy link
Contributor

I am not working on it

7 similar comments
@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

上周日,也尝试复现此问题,并没有复现。

@daviszhen
Copy link
Contributor

I am not working on it

1 similar comment
@daviszhen
Copy link
Contributor

I am not working on it

@daviszhen
Copy link
Contributor

daviszhen commented Apr 3, 2023

这个在复现

1 similar comment
@daviszhen
Copy link
Contributor

这个在复现

@daviszhen
Copy link
Contributor

还未复现

@daviszhen
Copy link
Contributor

这个在复现

@daviszhen
Copy link
Contributor

今日在pc机上没有复现。

@daviszhen
Copy link
Contributor

还在复现

@daviszhen
Copy link
Contributor

还没看

@daviszhen
Copy link
Contributor

还在复现

1 similar comment
@daviszhen
Copy link
Contributor

还在复现

@gavinyue
Copy link
Contributor

gavinyue commented May 3, 2023

有可能需要docker 多cn才能复现

@daviszhen
Copy link
Contributor

daviszhen commented May 4, 2023

今天复现了一次context deadline exceeded。hung住没复现。但是加打印后,就没复现了。我.....

@daviszhen
Copy link
Contributor

无进展

@daviszhen
Copy link
Contributor

还未看

@daviszhen
Copy link
Contributor

没看

1 similar comment
@daviszhen
Copy link
Contributor

没看

@daviszhen
Copy link
Contributor

daviszhen commented May 14, 2023

tpcc 10 仓 50 client 上跑出来下面的问题。

commitId : 5a56540

image

image

@daviszhen
Copy link
Contributor

daviszhen commented May 14, 2023

在另一台pc机上,与上一条同样tpcc。还跑出来了其它not found 问题。

我这边有完整的日志。

Screenshot from 2023-05-14 11-28-46
Screenshot from 2023-05-14 11-29-29

@LeftHandCold
Copy link
Contributor

在另一台pc机上,与上一条同样tpcc。还跑出来了其它not found 问题。

我这边有完整的日志。

Screenshot from 2023-05-14 11-28-46 Screenshot from 2023-05-14 11-29-29

This is the file that needs to be deleted during rollback. not found has no effect.

@xzxiong
Copy link
Contributor

xzxiong commented May 14, 2023

tpcc 10 仓 50 client 上跑出来下面的问题。

commitId : 5a56540

image

...

trace in #9424
该问题是 在mo 初始化时,创建依赖外表的 view时,报错的错误信息。但不影响 create view正常结束。

@daviszhen
Copy link
Contributor

今天tpcc 1仓 10 client。跑出来了oom。

Screenshot from 2023-05-15 19-04-14

@daviszhen
Copy link
Contributor

锁住的问题还在复现中。

@daviszhen
Copy link
Contributor

昨天在pc机上。10仓,50 clients。复现了卡住问题。

卡住的状态:

Screenshot from 2023-05-16 11-08-38

Screenshot from 2023-05-16 11-09-01

Screenshot from 2023-05-16 11-09-58

Screenshot from 2023-05-16 11-10-25

卡住的代码位置:

5901684207170_ pic

@cnutshell
Copy link
Contributor

cnutshell commented May 16, 2023

There are two goroutines that block on select.
WeChatWorkScreenshot_9e502b28-92d8-4b66-90d8-717047ae0800
WeChatWorkScreenshot_015ddf85-8054-4a82-9235-5749b8d5ddfa

@daviszhen
Copy link
Contributor

daviszhen commented May 16, 2023

@reusee 提了2个pr。
一个在main上。#9475
一个在此issue上的pr,(daviszhen) daviszhen#49

今天晚上验证第二个pr。

@daviszhen
Copy link
Contributor

乐声的修改还是会卡。可能是rpc。
打开debug日志,再跑。

@daviszhen
Copy link
Contributor

打开debug日志,跑了一晚上。没有复现。
切换到main分支。今天晚上再跑。

@daviszhen
Copy link
Contributor

昨天在main上 (commit 2e4ef52 )跑了11个小时tpcc。没有复现hung住的问题。
今天晚上再跑tpcc。

@daviszhen
Copy link
Contributor

daviszhen commented May 21, 2023

昨晚tpcc跑了一晚上。没有卡住。从20:58 ~ 07:58。
在main上没复现。

@daviszhen
Copy link
Contributor

在 main commit 2e4ef52 上测了两天。没再复现。

@daviszhen daviszhen assigned aressu1985 and unassigned daviszhen May 21, 2023
@aressu1985
Copy link
Contributor Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

7 participants