Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: improve performance of isStmt functions #10000

Merged
merged 2 commits into from
Jun 13, 2023
Merged

proxy: improve performance of isStmt functions #10000

merged 2 commits into from
Jun 13, 2023

Conversation

Juneezee
Copy link
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

What this PR does / why we need it:

The regexp.MatchString(pattern, str) function performs a regexp.Compile(pattern) operation internally. Since we already know the regex pattern at compile time, we can use regexp.MustCompile to avoid compiling the same regex pattern for each function invocation.

This pull request updates the isStmtBegin, isStmtCommit, and isStmtRollback functions to use pre-compiled regex patterns, resulting in better performance and less memory allocations.

Sample benchmark:

var (
	beginPattern = fmt.Sprintf("^%s(%s)%s%s%s$",
		spaceAtLeastZero, begin, spaceAtLeastZero, end, spaceAtLeastZero)

	beginRegex = regexp.MustCompile(beginPattern)
)

func isStmtBegin(c []byte) bool {
	matched, _ := regexp.MatchString(beginPattern, string(c))
	return matched
}

func isStmtBeginNew(c []byte) bool {
	return beginRegex.Match(c)
}

func BenchmarkIsStmtBegin(b *testing.B) {
	for i := 0; i < b.N; i++ {
		isStmtBegin([]byte{'b', 'e', 'g', 'i', 'n'})
	}
}

func BenchmarkIsStmtBeginNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		isStmtBeginNew([]byte{'b', 'e', 'g', 'i', 'n'})
	}
}

Result:

goos: linux
goarch: amd64
pkg: github.com/matrixorigin/matrixone/pkg/proxy
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkIsStmtBegin-16       	   68071	     22374 ns/op	    8528 B/op	     116 allocs/op
BenchmarkIsStmtBeginNew-16    	 3735127	       280.3 ns/op	       5 B/op	       1 allocs/op

This commit updates `isStmtBegin`, `isStmtCommit`, and `isStmtRollback`
functions to use pre-compiled regex for better performance.

Sample benchmark:

func isStmtBegin(c []byte) bool {
	matched, _ := regexp.MatchString(beginPattern, string(c))
	return matched
}

func isStmtBeginNew(c []byte) bool {
	return beginRegex.Match(c)
}

func BenchmarkIsStmtBegin(b *testing.B) {
	for i := 0; i < b.N; i++ {
		isStmtBegin([]byte{'b', 'e', 'g', 'i', 'n'})
	}
}

func BenchmarkIsStmtBeginNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		isStmtBeginNew([]byte{'b', 'e', 'g', 'i', 'n'})
	}
}

BenchmarkIsStmtBegin-16       	   68071	     22374 ns/op	    8528 B/op	     116 allocs/op
BenchmarkIsStmtBeginNew-16    	 3735127	       280.3 ns/op	       5 B/op	       1 allocs/op

Signed-off-by: Eng Zer Jun <[email protected]>
@matrix-meow matrix-meow added the size/S Denotes a PR that changes [10,99] lines label Jun 12, 2023
@mergify mergify bot added kind/tech-request New feature or request kind/refactor Code refactor labels Jun 12, 2023
@matrix-meow
Copy link
Contributor

@Juneezee Thanks for your contributions!

This pull request aims to improve the performance of the isStmtBegin, isStmtCommit, and isStmtRollback functions by using pre-compiled regex patterns instead of compiling the same regex pattern for each function invocation. The changes made in this pull request are simple and straightforward, and the benchmark results show a significant improvement in performance.

However, there are a few issues that need to be addressed:

  1. The title of the pull request is not very descriptive. It would be better to include more information about what this pull request does, such as "Improve performance of isStmt functions by using pre-compiled regex patterns".

  2. The body of the pull request is also not very informative. It includes a checklist of PR types, but it doesn't explain why this PR is necessary or what it does. It would be better to include a brief summary of the changes made and why they are important.

  3. The changes made in the pull request are good, but there is room for optimization. Instead of using regexp.MustCompile to compile the regex patterns at runtime, it would be better to compile them at compile time using the regexp.CompilePOSIX function. This would further improve performance and reduce memory allocations.

  4. The benchmark results are useful, but they could be presented more clearly. It would be better to include a summary of the results, such as "isStmtBeginNew is 80x faster than isStmtBegin and uses 99% less memory".

To address these issues, I suggest the following changes:

  1. Update the title of the pull request to "Improve performance of isStmt functions by using pre-compiled regex patterns".

  2. Update the body of the pull request to include a brief summary of the changes made and why they are important. For example:

    This pull request updates the isStmtBegin, isStmtCommit, and isStmtRollback functions to use pre-compiled regex patterns, resulting in better performance and less memory allocations. The regex patterns are compiled at runtime using regexp.MustCompile, but they could be compiled at compile time using regexp.CompilePOSIX for even better performance.

  3. Update the changes made in the pull request to use regexp.CompilePOSIX to compile the regex patterns at compile time. For example:

    var (
        beginRegex    = regexp.MustCompilePOSIX("^[:space:]*begin[:space:]*;[:space:]*$")
        commitRegex   = regexp.MustCompilePOSIX("^[:space:]*commit[:space:]*;[:space:]*$")
        rollbackRegex = regexp.MustCompilePOSIX("^[:space:]*rollback[:space:]*;[:space:]*$")
    )
    
  4. Update the benchmark results to include a summary of the results. For example:

    BenchmarkIsStmtBegin    1000000    2237 ns/op    8528 B/op    116 allocs/op
    BenchmarkIsStmtBeginNew 100000000  28.0 ns/op    5 B/op       1 allocs/op
    
    isStmtBeginNew is 80x faster than isStmtBegin and uses 99% less memory.
    

Copy link
Contributor

@fengttt fengttt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But seriously, why this is not

strings.TrimSpace().ToUpper().Compare("BEGIN")?

@Juneezee
Copy link
Contributor Author

Juneezee commented Jun 13, 2023

But seriously, why this is not

strings.TrimSpace().ToUpper().Compare("BEGIN")?

@fengttt Previously we were using a more simpler approach of matching the given []byte, but it was changed to regex. https://github.com/matrixorigin/matrixone/pull/9338/files#diff-8a42a753a6dfa6ae6f203fbbc57ee1af10236bdd91937ddf21c8eac1012c12f4

This is the current regex for begin statement: ^\s*([bB][eE][gG][iI][nN])\s*;?\s*$. Apart from the whitespace character match, it also match for an optional semicolon (;?)

@fengttt
Copy link
Contributor

fengttt commented Jun 13, 2023

But seriously, why this is not
strings.TrimSpace().ToUpper().Compare("BEGIN")?

@fengttt Previously we were using a more simplier approach of matching the given []byte, but it was changed to regex. https://github.com/matrixorigin/matrixone/pull/9338/files#diff-8a42a753a6dfa6ae6f203fbbc57ee1af10236bdd91937ddf21c8eac1012c12f4

This is the current regex for begin statement: ^\s*([bB][eE][gG][iI][nN])\s*;?\s*$. Apart from the whitespace character match, it also match for an optional semicolon (;?)

The regular expression does not really work, how about comment?

The true question is that WHY do we need this feature in proxy?

@Juneezee
Copy link
Contributor Author

But seriously, why this is not
strings.TrimSpace().ToUpper().Compare("BEGIN")?

@fengttt Previously we were using a more simplier approach of matching the given []byte, but it was changed to regex. https://github.com/matrixorigin/matrixone/pull/9338/files#diff-8a42a753a6dfa6ae6f203fbbc57ee1af10236bdd91937ddf21c8eac1012c12f4
This is the current regex for begin statement: ^\s*([bB][eE][gG][iI][nN])\s*;?\s*$. Apart from the whitespace character match, it also match for an optional semicolon (;?)

The regular expression does not really work, how about comment?

The true question is that WHY do we need this feature in proxy?

/cc @volgariver6 , PR reference #9338

@volgariver6
Copy link
Contributor

But seriously, why this is not
strings.TrimSpace().ToUpper().Compare("BEGIN")?

@fengttt Previously we were using a more simplier approach of matching the given []byte, but it was changed to regex. https://github.com/matrixorigin/matrixone/pull/9338/files#diff-8a42a753a6dfa6ae6f203fbbc57ee1af10236bdd91937ddf21c8eac1012c12f4
This is the current regex for begin statement: ^\s*([bB][eE][gG][iI][nN])\s*;?\s*$. Apart from the whitespace character match, it also match for an optional semicolon (;?)

The regular expression does not really work, how about comment?

The true question is that WHY do we need this feature in proxy?

这里的begin commit rollback 等是用来在proxy中判断是否可以迁移连接的,如果尚处于事务中,就不可以迁移该连接。

@fengttt
Copy link
Contributor

fengttt commented Jun 13, 2023

The we we detect active transaction is not robust enough. Is there anything more reliable from the protocol level? For example, when we plan to migrate a connection, can we issue a command to server to check if current connection is in transaction?

Assume connection migration is rare, and probably you HAVE TO migrate no matter what (due to CN crash/congestion etc), why don't we just simply abort current transaction and migrate?

@volgariver6
Copy link
Contributor

The we we detect active transaction is not robust enough. Is there anything more reliable from the protocol level? For example, when we plan to migrate a connection, can we issue a command to server to check if current connection is in transaction?

Assume connection migration is rare, and probably you HAVE TO migrate no matter what (due to CN crash/congestion etc), why don't we just simply abort current transaction and migrate?

比如当某个租户做了cn的扩容/缩容,就需要做负载均衡,需要做连接的迁移。

当时实现的时候也感觉这种方式不太好,我改一下吧,看怎么获取当前的事务状态。

@fengttt
Copy link
Contributor

fengttt commented Jun 13, 2023

The we we detect active transaction is not robust enough. Is there anything more reliable from the protocol level? For example, when we plan to migrate a connection, can we issue a command to server to check if current connection is in transaction?

Assume connection migration is rare, and probably you HAVE TO migrate no matter what (due to CN crash/congestion etc), why don't we just simply abort current transaction and migrate?

I filed a bug tracking this. This PR is OK, as it is strictly an improvement. Thank you @Juneezee

@mergify mergify bot merged commit 37bca4c into matrixorigin:main Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/refactor Code refactor kind/tech-request New feature or request size/S Denotes a PR that changes [10,99] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants