Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: TPMC of 1000w is 20% lower than in mo 1.2.0 #19453

Open
1 task done
aressu1985 opened this issue Oct 19, 2024 · 5 comments
Open
1 task done

[Bug]: TPMC of 1000w is 20% lower than in mo 1.2.0 #19453

aressu1985 opened this issue Oct 19, 2024 · 5 comments
Assignees
Labels
kind/bug Something isn't working phase/testing severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@aressu1985
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

30195c0

Other Environment Information

- Hardware parameters:
3*CN: 16C 64G
1*DN: 16C 64G
3*LOG: 4C 16G
1*PROXY: 3C 7G
- OS type:
- Others:

Actual Behavior

image

main metrics:
https://grafana.ci.matrixorigin.cn/d/ee086o0rtlq0wc/txn-metrics?orgId=1&var-interval=1m&var-namespace=mo-main-nightly-30195c002-20241018&var-pod=All&from=1729281003000&to=1729283446000

mo-1.2 metrics:
https://grafana.ci.matrixorigin.cn/d/ee086o0rtlq0wc/txn-metrics?orgId=1&var-interval=1m&var-namespace=mo-branch-nightly-c3c1ce0f5-20241018&var-pod=All&from=1729281003000&to=1729284046000

Expected Behavior

No response

Steps to Reproduce

run tpcc 1000w test

Additional information

No response

@aressu1985 aressu1985 added kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Oct 19, 2024
@aressu1985 aressu1985 added this to the 2.0.0 milestone Oct 19, 2024
@sukki37 sukki37 modified the milestones: 2.0.0, 2.1.0 Oct 20, 2024
@badboynt1
Copy link
Contributor

badboynt1 commented Oct 22, 2024

tpcc 1000仓,数据量比较大,memcache不是完全命中,需要频繁进行替换。

观察main分支跑起来,tps稳定后,cpu利用率不到一半。应该是加锁等待的开销比较大。抓了fgprof以后看到,确实主要是memcache加锁开销大。需要优化。
再观察cpu profile情况,主要开销都是在io上,这种情况只能通过sharding来提高cache命中率,进而提高tps。
(pprof) top
Showing nodes accounting for 370.37s, 55.70% of 664.90s total
Dropped 4010 nodes (cum <= 3.32s)
Showing top 10 nodes out of 311
flat flat% sum% cum cum%
119.69s 18.00% 18.00% 119.69s 18.00% runtime.memmove
71.97s 10.82% 28.83% 71.97s 10.82% internal/runtime/syscall.Syscall6
67.10s 10.09% 38.92% 138.40s 20.82% github.com/pierrec/lz4/v4/internal/lz4block.decodeBlock
26.38s 3.97% 42.88% 28.35s 4.26% runtime.findObject
23.80s 3.58% 46.46% 23.80s 3.58% runtime.memclrNoHeapPointers
16.98s 2.55% 49.02% 16.98s 2.55% sync/atomic.(*Int64).Add
15.32s 2.30% 51.32% 66.54s 10.01% runtime.scanobject
10.19s 1.53% 52.85% 18.04s 2.71% sync.(*poolChain).popTail
9.52s 1.43% 54.29% 9.70s 1.46% runtime.(*gcBits).bitp (inline)
9.42s 1.42% 55.70% 9.42s 1.42% sync/atomic.(*Uint32).CompareAndSwap

image

prof_1.zip

@badboynt1 badboynt1 assigned reusee and unassigned badboynt1 Oct 22, 2024
@reusee
Copy link
Contributor

reusee commented Oct 22, 2024

上面的fgprof profile有问题,需要更新到最新的版本。
从 cpu profile 看,内存缓存的锁开销并不大。

@reusee
Copy link
Contributor

reusee commented Oct 25, 2024

无进展

@reusee
Copy link
Contributor

reusee commented Oct 31, 2024

继续优化

@reusee
Copy link
Contributor

reusee commented Nov 5, 2024

已优化

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working phase/testing severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

5 participants