Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cache anomalies #10945

Closed
1 task done
Tracked by #11553
sukki37 opened this issue Aug 1, 2023 · 15 comments
Closed
1 task done
Tracked by #11553

[Bug]: cache anomalies #10945

sukki37 opened this issue Aug 1, 2023 · 15 comments
Assignees
Labels
area/memory impact/1.1 kind/bug Something isn't working severity/s1 High impact: Logical errors or data errors that must occur team/c2 to-next-release
Milestone

Comments

@sukki37
Copy link
Contributor

sukki37 commented Aug 1, 2023

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93): 333d855
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

@nnsgmsone encountered some unexpected cache behaviors. Specifically, there are instances where the cache suddenly exceeds the configured cache limit. This behavior is unexpected and could potentially lead to resource-related issues in production environments.

Several memory-related issues we've recently faced have been traced back to this cache anomaly. We have temporarily circumvented these specific issues by modifying configurations, but it's a makeshift solution. It's imperative to identify the root cause and fix it. Hence, this issue is being opened explicitly for tracking this behavior.

Related Issues:
#10659
https://github.com/matrixorigin/MO-Cloud/issues/984

Expected Behavior

All cache works as expected.

Steps to Reproduce

No response

Additional information

No response

@sukki37 sukki37 added kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Aug 1, 2023
@sukki37 sukki37 added this to the 1.0.0 milestone Aug 1, 2023
@nnsgmsone
Copy link
Contributor

我会处理这个问题。

@jensenojs
Copy link
Contributor

使用单机mo,将cn, dn, log的metacache / fileservice.cache的memory-capacity都设置为1MB之后,开始轮bvt。

下图是在跑完第八次bvt后做的pprof

image

@jensenojs
Copy link
Contributor

使用单机mo,将cn, dn, log的metacache / fileservice.cache的memory-capacity都设置为1MB之后,开始轮bvt。

下图是在跑完第八次bvt后做的pprof

image

补充一个diff_base下的heap_pprof的对比,可以更直观的看到跑了若干次bvt后,heap增长有一部分来源于HandleRowsDelete,的调用。它会突破内存限制的原因是:cn需要dn的信号来清理掉这些被delete的数据,这个问题现在被 #10370 跟踪。

image

其他内存上涨有必要会继续追踪。

@fengttt fengttt added severity/s-1 and removed severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Aug 7, 2023
@fengttt
Copy link
Contributor

fengttt commented Aug 7, 2023

because another S-1 #10659 depends on this one.

@nnsgmsone
Copy link
Contributor

nnsgmsone commented Aug 9, 2023

@nnsgmsone
Copy link
Contributor

等明天讨论后确定方案。

@nnsgmsone
Copy link
Contributor

正在处理

@nnsgmsone
Copy link
Contributor

正在编码中。。

1 similar comment
@nnsgmsone
Copy link
Contributor

正在编码中。。

@nnsgmsone
Copy link
Contributor

代码基本完成,正在测试

@nnsgmsone
Copy link
Contributor

debug中。

@nnsgmsone
Copy link
Contributor

继续测试中

@nnsgmsone
Copy link
Contributor

先出design doc,然后评审后继续开发.

@nnsgmsone
Copy link
Contributor

继续整理设计文档。

@aressu1985 aressu1985 added severity/s1 High impact: Logical errors or data errors that must occur and removed severity/s-1 labels Sep 8, 2023
@aressu1985
Copy link
Contributor

tracking by #11553 ,该ISSUE降级到s1

@sukki37 sukki37 removed this from the 1.0.0 milestone Oct 7, 2023
@sukki37 sukki37 added this to the 1.1.0 milestone Oct 7, 2023
@LiSong0214 LiSong0214 modified the milestones: 1.1.0, 1.2.0 Dec 15, 2023
@nnsgmsone nnsgmsone reopened this Jan 2, 2024
@sukki37 sukki37 closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/memory impact/1.1 kind/bug Something isn't working severity/s1 High impact: Logical errors or data errors that must occur team/c2 to-next-release
Projects
None yet
Development

No branches or pull requests

7 participants