Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [1105 main tke regression] tpch 1t 4cn test report 'wait notify message timeout'. #19802

Open
1 task done
Ariznawlll opened this issue Nov 5, 2024 · 6 comments
Open
1 task done
Assignees
Labels
kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@Ariznawlll
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

5e4d063

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job url: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/11666410553/job/32517908805

image

tpch测试期间日志(UTC时间)
https://grafana.ci.matrixorigin.cn/explore?panes=%7B%221u3%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-main-nightly-5e4d0638a-20241104%5C%22%7D%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221730790063000%22,%22to%22:%221730790183000%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

trigger workflow test on tke

Additional information

No response

@Ariznawlll Ariznawlll added kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Nov 5, 2024
@Ariznawlll Ariznawlll added this to the 2.0.1 milestone Nov 5, 2024
@badboynt1
Copy link
Contributor

@m-schen 麻烦看一下

@m-schen
Copy link
Contributor

m-schen commented Nov 5, 2024

首先这个肯定是一个偶先的错误,
暂时感觉原因是部分pipeline因为网络 / 本地client not ready等原因发送失败,转为在本地执行导致的卡死。

@m-schen
Copy link
Contributor

m-schen commented Nov 6, 2024

首先这个肯定是一个偶先的错误, 暂时感觉原因是部分pipeline因为网络 / 本地client not ready等原因发送失败,转为在本地执行导致的卡死。

检查发现这段逻辑很早就被移除了,目前没有这个逻辑,
感觉是因为网络出问题丢包导致的情况。

@m-schen
Copy link
Contributor

m-schen commented Nov 6, 2024

image

root cause是网络出了问题,导致有一个notify message发送失败,因此报出了该错误。

@m-schen
Copy link
Contributor

m-schen commented Nov 6, 2024

我会尝试修改让它能马上抛出这个rpc timeout的错误

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

4 participants