Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The existing data can not be found during tpcc new order txn #8871

Closed
1 task done
aressu1985 opened this issue Apr 6, 2023 · 54 comments
Closed
1 task done
Assignees
Labels
kind/bug Something isn't working severity/s-1
Milestone

Comments

@aressu1985
Copy link
Contributor

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Environment

- Version or commit-id (e.g. v0.1.0 or 8b23a93):f5e458d596b37ce62587bb876401f008010b1adb
- Hardware parameters:
- OS type:
- Others:

Actual Behavior

tools log:
Term-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1456920 Memory Usage: 44MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^HTerm-00, Running Average tpmTOTAL: 210.65 Current tpmTOTAL: 1456932 Memory Usage: 44MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H2023-04-04 23:00:53 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=8 S_I_ID=64375 not fount
2023-04-04 23:00:53 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=9 S_I_ID=33327 not fount
2023-04-04 23:00:53 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=2 S_I_ID=25463 not fount
2023-04-04 23:00:53 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=8 S_I_ID=41318 not fount
Term-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457028 Memory Usage: 45MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H2023-04-04 23:00:54 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=6 S_I_ID=17269 not fount
2023-04-04 23:00:54 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=6 S_I_ID=66407 not fount
Term-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457088 Memory Usage: 46MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H2023-04-04 23:00:54 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=6 S_I_ID=7975 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=7 S_I_ID=8053 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=7 S_I_ID=8053 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=9 S_I_ID=17270 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=8 S_I_ID=8813 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=1 S_I_ID=49527 not fount
2023-04-04 23:00:55 FATAL jTPCCTerminal:169 - TT_PAYMENT: Unexpected SQLException on transaction execution: w-w conflict
Term-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457232 Memory Usage: 48MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^HTerm-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457232 Memory Usage: 49MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H2023-04-04 23:00:58 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=8 S_I_ID=57847 not fount
Term-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457244 Memory Usage: 49MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^HTerm-00, Running Average tpmTOTAL: 210.66 Current tpmTOTAL: 1457256 Memory Usage: 49MB / 708MB ^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H2023-04-04 23:00:59 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=1 S_I_ID=41847 not fount
2023-04-04 23:00:59 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=1 S_I_ID=12911 not fount
2023-04-04 23:00:59 FATAL jTPCCTerminal:282 - STOCK with S_W_ID=5 S_I_ID=9070 not fount

But this data really exists in database:
Database changed
mysql> select * from bmsql_stock where S_W_ID = 5 and S_I_ID=9070;
+--------+--------+------------+-------+-------------+--------------+----------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
| s_w_id | s_i_id | s_quantity | s_ytd | s_order_cnt | s_remote_cnt | s_data | s_dist_01 | s_dist_02 | s_dist_03 | s_dist_04 | s_dist_05 | s_dist_06 | s_dist_07 | s_dist_08 | s_dist_09 | s_dist_10 |
+--------+--------+------------+-------+-------------+--------------+----------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
| 5 | 9070 | 46 | 41 | 9 | 0 | X8SIwg4qc2iDbtkR8cX5qvHJkv6J5UExFYsAC1Ba3OzJ | LneuMvINospnE1EpMvq7yjNp | elIQ1xUXukbxvpauCjy7KJsU | xFbwwgyjJHkhg4640OwHKc2g | AgtvWjyvfwneSvfYKERFUeQI | bcC5v7dKgX4jMHQt6Si2s3oh | JT4JoiGXf7P9y43pKhFOqd61 | IJMjA3VCDes3igI68R5uuO0h | LBqI8Y9DINu2ojAt62PW0xez | ublosq0GRWoopVkf6i5i7uE8 | Gm7hYeLekcKhSrBSeZRu5yO1 |
+--------+--------+------------+-------+-------------+--------------+----------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
1 row in set (0.11 sec)

mysql> select * from bmsql_stock where S_W_ID = 1 and S_I_ID=12911;
+--------+--------+------------+-------+-------------+--------------+------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
| s_w_id | s_i_id | s_quantity | s_ytd | s_order_cnt | s_remote_cnt | s_data | s_dist_01 | s_dist_02 | s_dist_03 | s_dist_04 | s_dist_05 | s_dist_06 | s_dist_07 | s_dist_08 | s_dist_09 | s_dist_10 |
+--------+--------+------------+-------+-------------+--------------+------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
| 1 | 12911 | 49 | 29 | 5 | 0 | MclNeez4gp42Cx7xthrl8CGhEIZVoBiaGsxnHaSL | zTlTLtifA0Bq4OON1BItCN0i | JlJ0OmIpR6BfTuQKkUvIS3Bf | VMorZrhVB9bkO7beDuK7FWt0 | eYLS9dGtC70tNmG8my3RAwxI | M19Wwwlshr4hwS98UeEQ8zcY | yVOGmgZtr6dCsxerT1Qims9K | c1wV3AqRjRCixzwlLhMU2rrm | ZVQZjRF8B1QgVbehH9YM4SRk | YIhmlHsFzUAQkm2Cfr4nubUA | JUJnk3vXNtQnQYckPpj7vyHU |
+--------+--------+------------+-------+-------------+--------------+------------------------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+--------------------------+
1 row in set (0.11 sec)

mysql>

Expected Behavior

No response

Steps to Reproduce

1、git clone https://github.com/matrixorigin/mo-tpcc.git
2、modify props.mo ,set warehouse = 10, terminals = 10, runMins=5000
3、run ./runLoader.sh props.mo warehouse 10 filelocatioin XXXX, to generate 10 warehouse data
4、load data to mo
5、run ./runBenchmark.sh props.mo for about several hours.

Additional information

No response

@aressu1985
Copy link
Contributor Author

prepareStmt sql:
stmtNewOrderSelectStock = dbConn.prepareStatement(
"SELECT s_quantity, s_data, " +
" s_dist_01, s_dist_02, s_dist_03, s_dist_04, " +
" s_dist_05, s_dist_06, s_dist_07, s_dist_08, " +
" s_dist_09, s_dist_10 " +
" FROM bmsql_stock " +
" WHERE s_w_id = ? AND s_i_id = ? " );

The new order txn is:

a.
SELECT d_tax, d_next_o_id " +
" FROM bmsql_district " +
" WHERE d_w_id = ? AND d_id = ? " +
" FOR UPDATE"
b.
//需要用到索引
SELECT c_discount, c_last, c_credit, w_tax " +
" FROM bmsql_customer " +
" JOIN bmsql_warehouse ON (w_id = c_w_id) " +
" WHERE c_w_id = ? AND c_d_id = ? AND c_id = ?"
c.
UPDATE bmsql_district " +
" SET d_next_o_id = d_next_o_id + 1 " +
" WHERE d_w_id = ? AND d_id = ?");
d.
INSERT INTO bmsql_oorder (" +
" o_id, o_d_id, o_w_id, o_c_id, o_entry_d, " +
" o_ol_cnt, o_all_local) " +
"VALUES (?, ?, ?, ?, ?, ?, ?)"
e.
INSERT INTO bmsql_new_order (" +
" no_o_id, no_d_id, no_w_id) " +
"VALUES (?, ?, ?)"

f.
SELECT i_price, i_name, i_data " +
" FROM bmsql_item " +
" WHERE i_id = ?"

g.SELECT s_quantity, s_data, " +
" s_dist_01, s_dist_02, s_dist_03, s_dist_04, " +
" s_dist_05, s_dist_06, s_dist_07, s_dist_08, " +
" s_dist_09, s_dist_10 " +
" FROM bmsql_stock " +
" WHERE s_w_id = ? AND s_i_id = ? " +
" FOR UPDATE")
h.
//batch update
UPDATE bmsql_stock " +
" SET s_quantity = ?, s_ytd = s_ytd + ?, " +
" s_order_cnt = s_order_cnt + 1, " +
" s_remote_cnt = s_remote_cnt + ? " +
" WHERE s_w_id = ? AND s_i_id = ?

i.
//batch insert
INSERT INTO bmsql_order_line (" +
" ol_o_id, ol_d_id, ol_w_id, ol_number, " +
" ol_i_id, ol_supply_w_id, ol_quantity, " +
" ol_amount, ol_dist_info) " +
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)

@sukki37 sukki37 assigned sukki37 and unassigned matrix-meow Apr 6, 2023
@sukki37
Copy link
Contributor

sukki37 commented Apr 6, 2023

@sukki37 sukki37 assigned aptend and unassigned sukki37 Apr 6, 2023
@aptend
Copy link
Contributor

aptend commented Apr 7, 2023

I've made some fix, self testing

@aptend aptend mentioned this issue Apr 10, 2023
7 tasks
@aptend
Copy link
Contributor

aptend commented Apr 10, 2023

some related logs:

not_found_failed_sql.log

d0b1ded1-d289-11ed-b9df-525400777c3a-367.log

d0b1ded1-d289-11ed-b9df-525400777c3a-315.log

the file not found error in these log files has been fixed in #8919.

but the STOCK with S_W_ID=9 S_I_ID=33327 not fount errors still keep coming out. I have reproduced the error in a single warehouse tpcc setting and I'm analyzing

@aptend
Copy link
Contributor

aptend commented Apr 11, 2023

repro script, the second select * from t will be empty in the transaction. It seems that the data in cn memtable is cleared by flush

drop database if exists db1;
create database db1;
use db1;
drop table if exists t;
create table t (a int);
insert into t values (1), (2), (3);

begin;
-- read, pull insert logtail
select * from t;
-- @session:id=1{
-- @separator:table
select mo_ctl('dn', 'flush', 'db1.t');
select a from db1.t where a = 1;
-- flush and pull meta logtail
-- @session}
-- read again
select * from t;
commit;
  1. ./mo-service -launch etc/launch-tae-CN-tae-DN/launch.toml 2>&1
  2. run this in mo-tester ./run.sh -m genrs -n -g -p <path-to-the-script> 2>&1

@aptend
Copy link
Contributor

aptend commented Apr 11, 2023

image

In the call stack, it seems that before the second select * from t, the table metadata has been updated to a new snapshot by Stats

@aptend aptend assigned m-schen and unassigned aptend Apr 11, 2023
@m-schen
Copy link
Contributor

m-schen commented Apr 11, 2023

begin;
select * from t;
select * from t;
end;

the second query, the plan builder will call the ReCalcNodeStats to get the new table meta.
if the table was flushed between first select and second select, It may cause the empty result error.
(too much place to call the ReCalcNodeStats, I can not removed code to verify if it was the root cause, sorry.)

repro step:

session1: 
drop table if exists t;
create table t (a int);
insert into t values (1), (2), (3);
begin;
select * from t;

session2:
select mo_ctl('dn', 'flush', 'db.t');
select a from db.t where a = 1;

session1:
select * from t;

and you will get empty result but required [1] [2] [3]

@m-schen
Copy link
Contributor

m-schen commented Apr 11, 2023

@ouyuanning plz assign it to a suitable man. thx.

@m-schen m-schen assigned ouyuanning and unassigned m-schen Apr 11, 2023
@ouyuanning ouyuanning assigned m-schen and unassigned ouyuanning Apr 11, 2023
@m-schen
Copy link
Contributor

m-schen commented Apr 11, 2023

begin;
select * from t;
select * from t;
end;

the second query, the plan builder will call the ReCalcNodeStats to get the new table meta.
if the table was flushed between first select and second select, It may cause the empty result error.
(too much place to call the ReCalcNodeStats, I can not removed code to verify if it was the root cause, sorry.)

repro step:

session1: 
drop table if exists t;
create table t (a int);
insert into t values (1), (2), (3);
begin;
select * from t;

session2:
select mo_ctl('dn', 'flush', 'db.t');
select a from db.t where a = 1;

session1:
select * from t;

and you will get empty result but required [1] [2] [3]

my fault. it's a wrong analysis.

@m-schen
Copy link
Contributor

m-schen commented Apr 11, 2023

repro step:

session1: 
drop table if exists t;
create table t (a int);
insert into t values (1), (2), (3);
begin;
select * from t;

session2:
select mo_ctl('dn', 'flush', 'db.t');
select a from db.t where a = 1;

session1:
select * from t; #get empty but should [1],[2],[3]

I guess that
https://github.com/matrixorigin/matrixone/blob/main/pkg/vm/engine/disttae/txn_table.go#L201
the second query select * from t get a wrong snapshot of partitions here.

maybe we should deliver a txn timestamp to pick up the partitions we want?

@reusee
plz take a look.

@m-schen m-schen assigned reusee and unassigned m-schen Apr 11, 2023
@reusee
Copy link
Contributor

reusee commented Apr 12, 2023

looking into it.

@fengttt fengttt reopened this Jul 7, 2023
@fengttt fengttt assigned zhangxu19830126 and unassigned aressu1985 Jul 7, 2023
@fengttt fengttt added severity/s-1 and removed severity/s1 High impact: Logical errors or data errors that must occur labels Jul 7, 2023
@fengttt fengttt modified the milestones: V0.8.0, 1.0.0 Jul 7, 2023
@fengttt
Copy link
Contributor

fengttt commented Jul 7, 2023

Repro on mojo, https://github.com/matrixorigin/mojo/tree/main/go, simply run

!repro mo-8871

local launch-tae-mulit-cn-tae-dn, pessimistic.

@zhangxu19830126
Copy link
Contributor

fixed

@aressu1985
Copy link
Contributor Author

tracking

@fengttt
Copy link
Contributor

fengttt commented Jul 14, 2023

fixed

@zhangxu19830126
I think my repro is fixed by YuanNing's select for update projection list. We are still seeing dup key in tpcc, correct? So it is not fixed yet ...

@aressu1985
Copy link
Contributor Author

so what is status?

@aressu1985 aressu1985 assigned ouyuanning and unassigned aressu1985 Jul 22, 2023
@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

not working on this today

@ouyuanning
Copy link
Contributor

not working on it

5 similar comments
@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

not working on it

@ouyuanning
Copy link
Contributor

ouyuanning commented Jul 31, 2023

乐观事务下的。应该之前乐声已经修复。

悲观事务下,有发现的几个问题。
1、select for update的bug,我有个PR处理了
2、测试工具的部分,有地方没有加select for update造成的问题。苏动处理了。
3、最后一个。由于我一个PR优化部分sink/sinkScan节点后引起了代价估算错误,走了s3。然后workspace不支持s3的重试。这个应该是后来龙冉团队哪位同学修复掉了。

目前应该已经没有丢失问题了。辛苦QA同学协助验证。

@ouyuanning ouyuanning assigned aressu1985 and unassigned ouyuanning Jul 31, 2023
@aressu1985
Copy link
Contributor Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working severity/s-1
Projects
None yet
Development

No branches or pull requests