Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](json) fix invalid json which should not be insert #39932

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

amorynan
Copy link
Contributor

@amorynan amorynan commented Aug 26, 2024

Proposed changes

fix invalid json which should not be insert
add simdjson lib check micro and fix fe fastjson param
Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@amorynan
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38192 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d422c7db1676281b191e8f51b2c6561aaf5009f1, data reload: false

------ Round 1 ----------------------------------
q1	17693	4546	4361	4361
q2	2021	191	172	172
q3	11221	936	1064	936
q4	10165	759	711	711
q5	7762	2895	2823	2823
q6	227	136	133	133
q7	989	616	622	616
q8	9331	2156	2092	2092
q9	7026	6592	6567	6567
q10	6991	2253	2229	2229
q11	451	251	241	241
q12	407	230	232	230
q13	17830	3068	3087	3068
q14	283	244	240	240
q15	526	487	484	484
q16	506	407	390	390
q17	1010	674	765	674
q18	7496	6779	6911	6779
q19	1383	1060	1074	1060
q20	664	332	327	327
q21	4307	3059	3171	3059
q22	1135	1035	1000	1000
Total cold run time: 109424 ms
Total hot run time: 38192 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4369	4314	4280	4280
q2	373	278	258	258
q3	2895	2654	2738	2654
q4	1961	1741	1680	1680
q5	5611	5735	5763	5735
q6	240	136	136	136
q7	2210	1850	1831	1831
q8	3379	3450	3491	3450
q9	8886	8931	8870	8870
q10	3654	3452	3397	3397
q11	608	504	502	502
q12	861	674	674	674
q13	13026	3200	3273	3200
q14	336	324	315	315
q15	554	506	490	490
q16	501	456	470	456
q17	1855	1567	1517	1517
q18	8186	7866	7798	7798
q19	1757	1515	1724	1515
q20	2157	1927	1889	1889
q21	5726	5584	5577	5577
q22	1171	1050	1012	1012
Total cold run time: 70316 ms
Total hot run time: 57236 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191164 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d422c7db1676281b191e8f51b2c6561aaf5009f1, data reload: false

query1	1250	896	873	873
query2	6246	1922	1796	1796
query3	10600	4044	4101	4044
query4	59724	23919	23173	23173
query5	5381	500	508	500
query6	407	160	182	160
query7	5760	300	300	300
query8	280	202	195	195
query9	8894	2470	2466	2466
query10	482	281	253	253
query11	16748	15049	15062	15049
query12	154	102	109	102
query13	1560	381	363	363
query14	11320	7274	6671	6671
query15	235	176	177	176
query16	7149	477	477	477
query17	1168	595	587	587
query18	1399	329	295	295
query19	300	150	154	150
query20	127	116	113	113
query21	210	108	106	106
query22	4534	4215	4516	4215
query23	34322	33908	33666	33666
query24	5927	2833	2858	2833
query25	542	395	399	395
query26	678	157	159	157
query27	1777	281	285	281
query28	3764	2040	2025	2025
query29	662	420	416	416
query30	238	155	149	149
query31	958	776	753	753
query32	86	55	59	55
query33	447	302	291	291
query34	891	471	486	471
query35	839	720	742	720
query36	1083	922	958	922
query37	143	83	87	83
query38	3967	3857	3859	3857
query39	1433	1414	1439	1414
query40	205	116	127	116
query41	46	47	45	45
query42	110	99	99	99
query43	523	468	466	466
query44	1073	737	750	737
query45	198	164	170	164
query46	1096	743	722	722
query47	1855	1800	1813	1800
query48	363	291	291	291
query49	779	435	444	435
query50	820	419	420	419
query51	7124	7008	7000	7000
query52	102	90	88	88
query53	255	184	178	178
query54	567	447	456	447
query55	78	79	77	77
query56	282	273	260	260
query57	1185	1073	1092	1073
query58	223	241	246	241
query59	2897	2897	2824	2824
query60	305	274	280	274
query61	122	207	100	100
query62	765	661	640	640
query63	217	183	184	183
query64	3174	1772	1749	1749
query65	3223	3094	3133	3094
query66	674	330	346	330
query67	15289	15152	15297	15152
query68	4487	560	542	542
query69	418	263	277	263
query70	1199	1101	1138	1101
query71	376	274	275	274
query72	6456	2325	2071	2071
query73	759	315	321	315
query74	9408	8757	8809	8757
query75	3358	2659	2672	2659
query76	1763	1030	995	995
query77	549	312	309	309
query78	11473	10594	9152	9152
query79	2623	537	549	537
query80	912	495	490	490
query81	557	225	229	225
query82	377	132	132	132
query83	225	148	143	143
query84	266	76	76	76
query85	734	289	293	289
query86	446	284	299	284
query87	4399	4242	4253	4242
query88	3814	2296	2281	2281
query89	381	289	291	289
query90	2027	196	189	189
query91	128	97	99	97
query92	66	52	50	50
query93	1728	540	533	533
query94	767	290	273	273
query95	362	261	260	260
query96	591	278	284	278
query97	3197	3065	3040	3040
query98	223	216	201	201
query99	1578	1312	1261	1261
Total cold run time: 310882 ms
Total hot run time: 191164 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d422c7db1676281b191e8f51b2c6561aaf5009f1, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.04	0.05
query4	1.68	0.07	0.08
query5	0.49	0.49	0.49
query6	1.13	0.74	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.48	0.49
query10	0.55	0.53	0.55
query11	0.16	0.12	0.11
query12	0.14	0.12	0.13
query13	0.62	0.60	0.58
query14	0.76	0.80	0.77
query15	0.84	0.81	0.82
query16	0.36	0.37	0.34
query17	0.98	1.00	1.00
query18	0.21	0.20	0.22
query19	1.89	1.77	1.79
query20	0.01	0.02	0.01
query21	15.39	0.67	0.66
query22	4.40	6.89	1.77
query23	18.26	1.53	1.40
query24	2.11	0.24	0.22
query25	0.16	0.07	0.08
query26	0.27	0.18	0.18
query27	0.07	0.08	0.08
query28	13.26	1.00	1.00
query29	12.55	3.38	3.36
query30	0.24	0.06	0.05
query31	2.87	0.41	0.41
query32	3.24	0.48	0.48
query33	2.96	3.03	2.97
query34	17.13	4.36	4.41
query35	4.45	4.42	4.44
query36	0.65	0.50	0.48
query37	0.19	0.16	0.16
query38	0.16	0.16	0.15
query39	0.04	0.03	0.04
query40	0.15	0.13	0.12
query41	0.10	0.05	0.05
query42	0.06	0.04	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.54 s
Total hot run time: 30.67 s

@@ -32,6 +34,13 @@ public class JsonLiteral extends Literal {

private static final ObjectMapper MAPPER = new ObjectMapper();

static {
// 启用严格的 JSON 验证配置
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use english

static {
// 启用严格的 JSON 验证配置
MAPPER.enable(DeserializationFeature.FAIL_ON_TRAILING_TOKENS);
MAPPER.enable(JsonParser.Feature.STRICT_DUPLICATE_DETECTION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be remove since be accept duplicate key in json

Comment on lines +40 to +41
MAPPER.enable(JsonParser.Feature.STRICT_DUPLICATE_DETECTION);
MAPPER.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed because it is used for java object deserialization

@xiaokang xiaokang changed the title [fix](json)add simdjson lib check micro and fix fe fastjson param [fix](json) fix invalid json which should not be insert Sep 1, 2024
@@ -69,6 +69,9 @@ endif()
# enable glog custom prefix
add_definitions(-DGLOG_CUSTOM_PREFIX_SUPPORT)

# enable simdjson check eof
add_definitions(-DSIMDJSON_CHECK_EOF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official comment in simdjson does not suggest turn it on.

option(SIMDJSON_CHECK_EOF "Check for the end of the input buffer. The setting is unnecessary since we require padding of the inputs. You should expect tests to fail with this option turned on." OFF)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do not add this, we should check in our logic for input string buffer which has error ending

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants