Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: tpcc batch insert data error:message body 325439679 is too large, max is 104857600 #7767

Closed
1 task done
aressu1985 opened this issue Jan 31, 2023 · 29 comments
Closed
1 task done
Assignees
Labels
kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@aressu1985
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):2bbb0b7090957920d0bb3a776735bbb97e75b6c2
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

when batch inserting tpcc 1 warehouse data to mo, error occure:

[sudong@VM-16-14-centos mo-tpcc]$ ./runLoader.sh props.mo warehouse 1
Starting BenchmarkSQL LoadData

props.mo
driver=com.mysql.cj.jdbc.Driver
conn=jdbc:mysql://127.0.0.1:6001/tpcc?characterSetResults=utf8&continueBatchOnError=false&useServerPrepStmts=true&alwaysSendSetIsolation=false&useLocalSessionState=true&zeroDateTimeBehavior=CONVERT_TO_NULL&failoverReadOnly=false&serverTimezone=Asia/Shanghai&enabledTLSProtocols=TLSv1.2&useSSL=false
user=dump
password=***********
warehouses=1
loadWorkers=4
fileLocation (not defined)
csvNullValue (not defined - using default '')

Worker 000: Loading ITEM
Worker 001: Loading Warehouse 1
Worker 000: Loading ITEM done
java.sql.SQLException: internal error: message body 325439679 is too large, max is 104857600
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ConnectionImpl.commit(ConnectionImpl.java:813)
at io.mo.LoadDataWorker.loadWarehouse(LoadDataWorker.java:765)
at io.mo.LoadDataWorker.run(LoadDataWorker.java:196)
at java.lang.Thread.run(Thread.java:748)
Worker 001: ERROR: internal error: message body 325439679 is too large, max is 104857600

mo error log:
2023/01/31 21:41:11.888260 +0800 INFO frontend/util.go:515 rowCount 6
time of getDataFromPipeline : 19.206µs
processBatchTime 17.653µs
row2colTime 0s
allocateOutBufferTime 281ns
outputQueue.flushTime 0s
processBatchTime - row2colTime - allocateOutbufferTime - flushTime 17.372µs
restTime(=tTime - row2colTime - allocateOutBufferTime) 18.925µs
protoStats flushCount 0 writeCount 1
writeBytes 0 0 MB
connectionId 1097
2023/01/31 21:41:11.888289 +0800 INFO frontend/util.go:515 time of Exec.Run : 215.664µs connectionId 1097
2023/01/31 21:41:11.888345 +0800 INFO frontend/util.go:500 query trace status {"connection_id": 1097, "statement": "select cron_task_id, task_metadata_id, task_metadata_executor, task_metadata_context, task_metadata_option, cron_expr, next_time, trigger_times, create_at, update_at from mo_task.sys_cron_task", "status": "success", "span": {"trace_id": "ecbf78a8-a16c-11ed-8d9f-525400777c3a", "kind": "statement"}, "session_info": "connectionId 1097"}
2023/01/31 21:41:13.133487 +0800 ERROR frontend/mysql_cmd_executor.go:4662 error: internal error: serialize plan to json error: json: unsupported value: NaN {"span": {"trace_id": "265edc09-8ac0-9206-d061-bd9e25bac576", "span_id": "73856164f3a208d4"}}
github.com/matrixorigin/matrixone/pkg/frontend.serializePlanToJson
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4662
github.com/matrixorigin/matrixone/pkg/frontend.glob..func17
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/frontend/mysql_cmd_executor.go:4698
github.com/matrixorigin/matrixone/pkg/util/trace/impl/motrace.(*StatementInfo).ExecPlan2Json
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/impl/motrace/report_statement.go:169
github.com/matrixorigin/matrixone/pkg/util/trace/impl/motrace.(*StatementInfo).CsvFields
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/impl/motrace/report_statement.go:137
github.com/matrixorigin/matrixone/pkg/util/trace/impl/motrace.genCsvData.func1
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/impl/motrace/buffer_pipe.go:181
github.com/matrixorigin/matrixone/pkg/util/trace/impl/motrace.genCsvData
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/trace/impl/motrace/buffer_pipe.go:196
github.com/matrixorigin/matrixone/pkg/util/export.(*MOCollector).doGenerate
/mnt/datadisk0/actions-runner/_work/mo-nightly-regression/mo-nightly-regression/head/pkg/util/export/batch_processor.go:343
2023/01/31 21:41:14.200180 +0800 INFO frontend/util.go:500 query trace {"connection_id": 1097, "query": "use mo_task", "session_info": "connectionId 1097"}
2023/01/31 21:41:14.201082 +0800 INFO frontend/util.go:515 User 9a4c0b8d-3f24-4c5e-84a2-cb73d0eeb36b change database from [mo_task] to [mo_task] connectionId 1097
2023/01/31 21:41:14.201137 +0800 INFO frontend/util.go:500 query trace status {"connection_id": 1097, "statement": "use mo_task", "status": "success", "span": {"trace_id": "ee208610-a16c-11ed-8d9f-525400777c3a", "kind": "statement"}, "session_info": "connectionId 1097"}
2023/01/31 21:41:14.201170 +0800 INFO frontend/util.go:500 query trace {"connection_id": 1097, "query": "show tables", "session_info": "connectionId 1097"}
2023/01/31 21:41:14.202921 +0800 INFO frontend/util.go:515 time of Exec.Build : 1.723985ms connectionId 1097
2023/01/31 21:41:14.203247 +0800 INFO frontend/util.go:515 rowCount 2
time of getDataFromPipeline : 4.438µs
processBatchTime 3.166µs
row2colTime 0s
allocateOutBufferTime 240ns
outputQueue.flushTime 0s
processBatchTime - row2colTime - allocateOutbufferTime - flushTime 2.926µs
restTime(=tTime - row2colTime - allocateOutBufferTime) 4.198µs
protoStats flushCount 0 writeCount 1
writeBytes 0 0 MB
connectionId 1097
2023/01/31 21:41:14.203268 +0800 INFO frontend/util.go:515 time of Exec.Run : 282.269µs connectionId 1097
2023/01/31 21:41:14.203324 +0800 INFO frontend/util.go:500 query trace status {"connection_id": 1097, "statement": "show tables", "status": "success", "span": {"trace_id": "ee20ac09-a16c-11ed-8d9f-525400777c3a", "kind": "statement"}, "session_info": "connectionId 1097"}

Expected Behavior

No response

Steps to Reproduce

1. git clone https://github.com/matrixorigin/mo-tpcc.git
2. create database and table by sql/tableCreates.sql
3. ./runLoader.sh props.mo warehouse 1

Additional information

No response

@aressu1985 aressu1985 added kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Jan 31, 2023
@aressu1985 aressu1985 added this to the v0.7.0 milestone Jan 31, 2023
@iceTTTT iceTTTT self-assigned this Feb 1, 2023
@jianwan0214
Copy link
Contributor

work on it

@jianwan0214
Copy link
Contributor

it is because large batch insert do not through the s3. so help to solve it. @jensenojs

@jianwan0214 jianwan0214 assigned jensenojs and unassigned jianwan0214 and iceTTTT Feb 2, 2023
@jensenojs
Copy link
Contributor

it is because large batch insert do not through the s3. so help to solve it. @jensenojs

Still looking for the exact problem

@jensenojs
Copy link
Contributor

jensenojs commented Feb 6, 2023

Current Situation

  • insert value do not go throught the s3, that is correct. But when it comes to inserting a particularly large number of data, it will automatically insert in batches (4096 rows in a batch), so it is not the reason for this issue
  • however, the function commit in distae (genWriteReqs) does write more than 200,000 lines (that is, it should trigger s3), and this serialization may cause morpc's package to be too large

Conclusion

  • This issue needs to wait for cn support to do update and delete directly to s3 after further testing, i.e. this issue could not be completed in 0.7
  • insert value go throught s3 may still needed considering the performance improvement

@jensenojs
Copy link
Contributor

working on #7748

@aressu1985
Copy link
Contributor Author

2023/02/07 11:33:30.257769 +0800 ERROR frontend/util.go:510 query trace status {"connection_id": 1016, "statement": "load data infile /data1/sudong/mo-load-data/data/tpch_100/lineitem.tbl into table tpch_100g.lineitem fields terminated by "|" lines terminated by "\n"", "status": "fail", "error": "context deadline exceeded", "span": {"trace_id": "3c989dc0-a696-11ed-aca8-b07b25f8b524", "kind": "statement"}, "session_info": "connectionId 1016"}

@jensenojs
Copy link
Contributor

not process

@nnsgmsone nnsgmsone removed this from the V0.8.0-Backlog milestone Feb 13, 2023
@JackTan25
Copy link
Contributor

See #8325

@JackTan25
Copy link
Contributor

#8325 is merged

@JackTan25 JackTan25 assigned aressu1985 and unassigned w-zr and JackTan25 Mar 23, 2023
@aressu1985
Copy link
Contributor Author

on test

@aressu1985
Copy link
Contributor Author

test today

@aressu1985
Copy link
Contributor Author

This operation still can not succeed, the error is :
[sudong@VM-16-14-centos sql]$ cd ..
[sudong@VM-16-14-centos mo-tpcc]$ ./runLoader.sh props.mo warehouse 1
Starting BenchmarkSQL LoadData

props.mo
driver=com.mysql.cj.jdbc.Driver
conn=jdbc:mysql://127.0.0.1:6001/tpcc?characterSetResults=utf8&continueBatchOnError=false&useServerPrepStmts=true&alwaysSendSetIsolation=false&useLocalSessionState=true&zeroDateTimeBehavior=CONVERT_TO_NULL&failoverReadOnly=false&serverTimezone=Asia/Shanghai&enabledTLSProtocols=TLSv1.2&useSSL=false
user=dump
password=***********
warehouses=1
loadWorkers=4
fileLocation (not defined)
csvNullValue (not defined - using default '')

Worker 000: Loading ITEM
Worker 001: Loading Warehouse 1
Worker 000: Loading ITEM done

java.sql.SQLException: context deadline exceeded
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ConnectionImpl.commit(ConnectionImpl.java:813)
at io.mo.LoadDataWorker.loadWarehouse(LoadDataWorker.java:765)
at io.mo.LoadDataWorker.run(LoadDataWorker.java:196)
at java.lang.Thread.run(Thread.java:748)
Worker 001: ERROR: context deadline exceeded

@JackTan25 pls check again

@aressu1985 aressu1985 assigned JackTan25 and unassigned aressu1985 Mar 28, 2023
@JackTan25
Copy link
Contributor

working on s-1

@JackTan25
Copy link
Contributor

ctx deadline bug ,will check next monday

@JackTan25
Copy link
Contributor

Check this tomorrow.

@gavinyue
Copy link
Contributor

[mysql] 2023/04/27 22:37:35 packets.go:37: read tcp 127.0.0.1:49839->127.0.0.1:16001: i/o timeout
[mysql] 2023/04/27 22:37:35 packets.go:37: read tcp 127.0.0.1:49840->127.0.0.1:16002: i/o timeout
[mysql] 2023/04/27 22:37:40 packets.go:37: read tcp 127.0.0.1:49844->127.0.0.1:16001: i/o timeout
[mysql] 2023/04/27 22:37:50 packets.go:37: read tcp 127.0.0.1:49862->127.0.0.1:16001: i/o timeout
[mysql] 2023/04/27 22:37:55 packets.go:37: read tcp 127.0.0.1:49871->127.0.0.1:16001: i/o timeout
[mysql] 2023/04/27 22:38:00 packets.go:37: read tcp 127.0.0.1:49898->127.0.0.1:16001: read: connection reset by peer
[mysql] 2023/04/27 22:38:00 packets.go:37: read tcp 127.0.0.1:49891->127.0.0.1:16001: read: connection reset by peer
[mysql] 2023/04/27 22:38:00 packets.go:122: closing bad idle connection: EOF
[mysql] 2023/04/27 22:38:00 packets.go:37: unexpected EOF
[mysql] 2023/04/27 22:38:00 packets.go:37: unexpected EOF
[mysql] 2023/04/27 22:38:00 packets.go:37: unexpected EOF

@gavinyue
Copy link
Contributor

After running the tpcc for a few seconds, the connection starts to timeout.

@gavinyue
Copy link
Contributor

gavinyue commented May 3, 2023

The above error is from the task service. The reason is that the long running insert can prevent CN accepting new connections.

@gavinyue
Copy link
Contributor

related #9647

@gavinyue
Copy link
Contributor

gavinyue commented Jun 6, 2023

Do not see this type of error anymore recently. Probably could close.

@fengttt
Copy link
Contributor

fengttt commented Jun 6, 2023

@gavinyue please reassign to test. thank you.

@gavinyue
Copy link
Contributor

gavinyue commented Jun 8, 2023

Reassgian to test. tpcc batch insert may have other issues, but do not see this one after few tries.

@gavinyue
Copy link
Contributor

gavinyue commented Jun 8, 2023

#9866

Fix in this pr

@gavinyue gavinyue removed their assignment Jun 9, 2023
@aressu1985
Copy link
Contributor Author

fxied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working needs-triage severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

10 participants