Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: eks bvt regression: the same sql and statement_id log records in statement_info are greater than 1 #9891

Closed
1 task done
heni02 opened this issue Jun 7, 2023 · 23 comments
Assignees
Labels
kind/bug Something isn't working severity/s-1
Milestone

Comments

@heni02
Copy link
Contributor

heni02 commented Jun 7, 2023

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):nightly-3ca7054
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

job: https://github.com/matrixorigin/mo-nightly-regression/actions/runs/5189483136/jobs/9354645167
image

Expected Behavior

No response

Steps to Reproduce

No response

Additional information

No response

@heni02 heni02 added kind/bug Something isn't working needs-triage labels Jun 7, 2023
@volgariver6 volgariver6 assigned xzxiong and unassigned matrix-meow Jun 7, 2023
@volgariver6
Copy link
Contributor

@xzxiong please check this issue.

@xzxiong
Copy link
Contributor

xzxiong commented Jun 7, 2023

log status:

  1. cn0.log and cn1.log missed log before 2023-06-06 14:13:08
  2. mo data on eks already deleted
  3. at 2023/06/06 14:15:13.064264 do run ETLMerge task.

@xzxiong
Copy link
Contributor

xzxiong commented Jun 7, 2023

root cause

this is client-timeout while do txn commit:
do commit in open txn and meet timeout before commit exec, BUT the txn will commit successed.
==> then DB in pending status

op trace:

mo time
t0 // connect mo with timeout: 2 seconds
t1 begin
t2 insert into user_info(name, create_time) values ("jacksonxie", "2023-06-07 17:26:21")
t3 insert with id: 20002 // got LastInsertId: 20002
t4 commit
t5 // manually wait 3 second -- got timeout in 2 seconds

query test_txn.user_info found out that: id: 20002already in table.

mysql> select * from user_info order by id;
+-------+------------+---------------------+
| id    | name       | create_time         |
+-------+------------+---------------------+
|     1 | jacksonxie | 2023-06-07 15:10:41 |
|     2 | jacksonxie | 2023-06-07 15:10:44 |
|     3 | jacksonxie | 2023-06-07 15:10:46 |
|     4 | jacksonxie | 2023-06-07 15:11:02 |
|     5 | jacksonxie | 2023-06-07 15:13:36 |
| 10001 | jacksonxie | 2023-06-07 17:12:37 |
| 20001 | jacksonxie | 2023-06-07 17:24:30 |
| 20002 | jacksonxie | 2023-06-07 17:26:21 |
+-------+------------+---------------------+
8 rows in set (0.01 sec)

repro step

	case *tree.CommitTransaction:
+		if ses.GetDatabaseName() == "test_txn" {
+			time.Sleep(3 * time.Second)
+		}
		err = ses.TxnCommit()
		if err != nil {
			return err
		}

image

  • test code
    code.tgz
    run out with error, run: tar zxfv code.tgz && cd code && make run
2023/06/07 17:26:20.663970 +0800 INFO init log
2023/06/07 17:26:20.664166 +0800 INFO conn dsn: dump:111@tcp(127.0.0.1:6001)/test_txn?parseTime=true&loc=Local&readTimeout=1s&writeTimeout=1s&timeout=1s&maxAllowedPacket=16777216
2023/06/07 17:26:20.764385 +0800 INFO success insert id: 20002, affect: 1
2023/06/07 17:26:21.764635 +0800 ERROR mysql read tcp 127.0.0.1:65430->127.0.0.1:6001: i/o timeout
main.(*MysqlLogger).Print
        /Users/jacksonxie/go/src/github.com/jacksonxie/db/mysql.go:21
github.com/go-sql-driver/mysql.(*mysqlConn).readPacket
        /Users/jacksonxie/go/pkg/mod/github.com/go-sql-driver/[email protected]/packets.go:37
github.com/go-sql-driver/mysql.(*mysqlConn).readResultSetHeaderPacket
        /Users/jacksonxie/go/pkg/mod/github.com/go-sql-driver/[email protected]/packets.go:538
github.com/go-sql-driver/mysql.(*mysqlConn).exec
        /Users/jacksonxie/go/pkg/mod/github.com/go-sql-driver/[email protected]/connection.go:334
github.com/go-sql-driver/mysql.(*mysqlTx).Commit
        /Users/jacksonxie/go/pkg/mod/github.com/go-sql-driver/[email protected]/transaction.go:19
database/sql.(*Tx).Commit.func1
        /usr/local/go/src/database/sql/sql.go:2268
database/sql.withLock
        /usr/local/go/src/database/sql/sql.go:3405
database/sql.(*Tx).Commit
        /usr/local/go/src/database/sql/sql.go:2267
main.accessDBTxnTimeout
        /Users/jacksonxie/go/src/github.com/jacksonxie/db/main.go:367
main.main
        /Users/jacksonxie/go/src/github.com/jacksonxie/db/main.go:412
runtime.main
        /usr/local/go/src/runtime/proc.go:250
2023/06/07 17:26:21.764914 +0800 ERROR commit failed: invalid connection
main.accessDBTxnTimeout
        /Users/jacksonxie/go/src/github.com/jacksonxie/db/main.go:369
main.main
        /Users/jacksonxie/go/src/github.com/jacksonxie/db/main.go:412
runtime.main
        /usr/local/go/src/runtime/proc.go:250
2023/06/07 17:26:21.764961 +0800 INFO Done ON 127.0.0.1:6001
20230607.172621.764963

@xzxiong
Copy link
Contributor

xzxiong commented Jun 7, 2023

not sure how to fix
proposal:

  1. mo should sense the connection status
  2. mo commit op should sucess in 3 sec.
  3. etl sql write should mark those files which is meet commit timeout.

@xzxiong

This comment was marked as off-topic.

@sukki37 sukki37 added this to the V0.8.0 milestone Jun 8, 2023
@xzxiong
Copy link
Contributor

xzxiong commented Jun 8, 2023

still try to repro.

@xzxiong

This comment was marked as outdated.

@xzxiong
Copy link
Contributor

xzxiong commented Jun 9, 2023

important one CI result https://github.com/xzxiong/matrixone/actions/runs/5218093671

  • show there is NO op commit failed.

@xzxiong
Copy link
Contributor

xzxiong commented Jun 12, 2023

important CI job: https://github.com/xzxiong/matrixone/actions/runs/5239279033/jobs/9462419306
collect great infos.

image

@xzxiong
Copy link
Contributor

xzxiong commented Jun 13, 2023

@gavinyue ptal

@gavinyue
Copy link
Contributor

Will fix this along with new insert change.

@gavinyue
Copy link
Contributor

pending on #9834

@gavinyue
Copy link
Contributor

#10090 一起处理

@gavinyue
Copy link
Contributor

Looking into this today

@gavinyue
Copy link
Contributor

pending on the stmt_aggr merge

@gavinyue
Copy link
Contributor

Pending for the stmt_aggr pr merge. Stuck on dn oom

@gavinyue
Copy link
Contributor

Merged the PR for stmt_aggr. Close this issue.

@matrix-meow matrix-meow reopened this Jun 26, 2023
@xzxiong
Copy link
Contributor

xzxiong commented Jun 27, 2023

important CI job: https://github.com/xzxiong/matrixone/actions/runs/5239279033/jobs/9462419306 collect great infos.

image

@gavinyue ptal
in db_holder.go, there still has problem between calling bulkInsert and context.Done()

go bulkInsert(ctx, done, dbConn, records, tbl, MaxInsertLen, MiddleInsertLen)

@gavinyue
Copy link
Contributor

Start coding this. ETA 27th

@gavinyue
Copy link
Contributor

Pr draft is ready

@gavinyue
Copy link
Contributor

PR is ready for review

@gavinyue
Copy link
Contributor

Should be resolved after the #10302 merged

@florashi181 florashi181 modified the milestones: V0.8.0, 1.0.0 Jul 2, 2023
@heni02
Copy link
Contributor Author

heni02 commented Jul 3, 2023

The problem is not easy to reproduce in the case. After reviewing the eks regression results for the past 3 days, no issues were found. Please close it first
he past 3 days eks bvt regression:
https://github.com/matrixorigin/matrixone/actions/runs/5434137867/jobs/9882428023
https://github.com/matrixorigin/matrixone/actions/runs/5439585269/jobs/9891842672
https://github.com/matrixorigin/matrixone/actions/runs/5430403710/jobs/9876190235

@heni02 heni02 closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/s-1
Projects
None yet
Development

No branches or pull requests

7 participants