Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to regexp_replace function #4375

Merged
merged 7 commits into from
Oct 17, 2024
Merged

Add option to regexp_replace function #4375

merged 7 commits into from
Oct 17, 2024

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Oct 16, 2024

Description

This PR adds an option parameter to the regex_replace function.

Only global replace option ('g') is supported right now.

#Close #4331

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.21%. Comparing base (fcee678) to head (7743236).

Files with missing lines Patch % Lines
src/function/vector_string_functions.cpp 83.33% 3 Missing ⚠️
...unction/string/functions/regexp_replace_function.h 83.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4375      +/-   ##
==========================================
- Coverage   88.21%   88.21%   -0.01%     
==========================================
  Files        1355     1355              
  Lines       53640    53674      +34     
  Branches     7108     7111       +3     
==========================================
+ Hits        47317    47346      +29     
- Misses       6149     6153       +4     
- Partials      174      175       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: 563f246ace055f4fb00075a66eabe6d8becb4dee
Branch commit hash: 5f5335e2f1aba8cdf71b2337ae7f8e6b242f12f0

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 643.75 657.49 -13.75 (-2.09%)
aggregation q28 11805.94 11634.69 171.25 (1.47%)
copy node-Comment 65893.50 N/A N/A
copy node-Forum 4939.34 N/A N/A
copy node-Organisation 1264.29 N/A N/A
copy node-Person 2068.08 N/A N/A
copy node-Place 1210.88 N/A N/A
copy node-Post 27514.31 N/A N/A
copy node-Tag 1238.32 N/A N/A
copy node-Tagclass 1150.75 N/A N/A
copy rel-comment-hasCreator 52382.10 N/A N/A
copy rel-comment-hasTag 74099.21 N/A N/A
copy rel-comment-isLocatedIn 59732.01 N/A N/A
copy rel-containerOf 13582.59 N/A N/A
copy rel-forum-hasTag 3371.32 N/A N/A
copy rel-hasInterest 2331.08 N/A N/A
copy rel-hasMember 45284.19 N/A N/A
copy rel-hasModerator 1185.39 N/A N/A
copy rel-hasType 332.76 N/A N/A
copy rel-isPartOf 293.88 N/A N/A
copy rel-isSubclassOf 203.92 N/A N/A
copy rel-knows 5415.20 N/A N/A
copy rel-likes-comment 74877.51 N/A N/A
copy rel-likes-post 29451.20 N/A N/A
copy rel-organisation-isLocatedIn 309.64 N/A N/A
copy rel-person-isLocatedIn 458.18 N/A N/A
copy rel-post-hasCreator 14048.06 N/A N/A
copy rel-post-hasTag 18799.44 N/A N/A
copy rel-post-isLocatedIn 15307.30 N/A N/A
copy rel-replyOf-comment 50001.42 N/A N/A
copy rel-replyOf-post 37198.99 N/A N/A
copy rel-studyAt 433.33 N/A N/A
copy rel-workAt 611.87 N/A N/A
filter q14 129.78 136.67 -6.88 (-5.04%)
filter q15 131.93 133.13 -1.20 (-0.90%)
filter q16 301.45 309.28 -7.83 (-2.53%)
filter q17 447.36 465.94 -18.58 (-3.99%)
filter q18 1922.46 1947.58 -25.12 (-1.29%)
fixed_size_expr_evaluator q07 541.39 560.75 -19.35 (-3.45%)
fixed_size_expr_evaluator q08 760.19 773.26 -13.07 (-1.69%)
fixed_size_expr_evaluator q09 760.80 776.72 -15.92 (-2.05%)
fixed_size_expr_evaluator q10 239.82 253.36 -13.54 (-5.34%)
fixed_size_expr_evaluator q11 233.08 247.96 -14.88 (-6.00%)
fixed_size_expr_evaluator q12 231.80 246.81 -15.01 (-6.08%)
fixed_size_expr_evaluator q13 1468.03 1483.94 -15.92 (-1.07%)
fixed_size_seq_scan q23 116.38 128.14 -11.76 (-9.18%)
join q29 656.15 N/A N/A
join q30 1501.52 N/A N/A
join q31 12.54 7.12 5.42 (76.12%)
ldbc_snb_ic q35 418.84 504.32 -85.49 (-16.95%)
ldbc_snb_ic q36 38.78 15.63 23.15 (148.14%)
ldbc_snb_is q32 4.60 4.69 -0.09 (-1.97%)
ldbc_snb_is q33 18.58 14.22 4.35 (30.61%)
ldbc_snb_is q34 3.50 1.81 1.70 (93.88%)
multi-rel multi-rel-large-scan 1784.39 2025.27 -240.89 (-11.89%)
multi-rel multi-rel-lookup 51.62 22.49 29.13 (129.56%)
multi-rel multi-rel-small-scan 64.55 38.56 25.99 (67.40%)
order_by q25 132.73 142.39 -9.66 (-6.78%)
order_by q26 454.26 457.22 -2.97 (-0.65%)
order_by q27 1463.38 1413.03 50.35 (3.56%)
scan_after_filter q01 171.36 186.74 -15.38 (-8.24%)
scan_after_filter q02 157.30 174.35 -17.05 (-9.78%)
shortest_path_ldbc100 q37 3464.09 3480.92 -16.84 (-0.48%)
shortest_path_ldbc100 q38 51.69 N/A N/A
shortest_path_ldbc100 q39 81.18 53.92 27.26 (50.57%)
shortest_path_ldbc100 q40 63.54 84.15 -20.61 (-24.49%)
var_size_expr_evaluator q03 2069.30 2086.15 -16.85 (-0.81%)
var_size_expr_evaluator q04 2210.16 2258.94 -48.78 (-2.16%)
var_size_expr_evaluator q05 2633.66 2612.81 20.85 (0.80%)
var_size_expr_evaluator q06 1333.00 1381.81 -48.80 (-3.53%)
var_size_seq_scan q19 1470.26 1480.67 -10.41 (-0.70%)
var_size_seq_scan q20 2776.02 3050.34 -274.31 (-8.99%)
var_size_seq_scan q21 2289.76 2431.81 -142.06 (-5.84%)
var_size_seq_scan q22 129.43 131.22 -1.79 (-1.36%)

Copy link

Benchmark Result

Master commit hash: fcee67835c441f943c5a44ac4b230be92aec0bd4
Branch commit hash: f3c013b04ab77b22b1f0e5bca28cb4ea1c411bda

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 644.78 658.31 -13.54 (-2.06%)
aggregation q28 12002.71 11672.58 330.13 (2.83%)
filter q14 128.96 144.75 -15.79 (-10.91%)
filter q15 130.34 142.37 -12.03 (-8.45%)
filter q16 307.82 323.43 -15.61 (-4.83%)
filter q17 444.00 460.81 -16.81 (-3.65%)
filter q18 1912.28 1951.92 -39.64 (-2.03%)
fixed_size_expr_evaluator q07 541.86 559.81 -17.96 (-3.21%)
fixed_size_expr_evaluator q08 759.66 771.57 -11.92 (-1.54%)
fixed_size_expr_evaluator q09 760.24 771.69 -11.46 (-1.48%)
fixed_size_expr_evaluator q10 239.75 254.36 -14.61 (-5.74%)
fixed_size_expr_evaluator q11 232.94 249.96 -17.02 (-6.81%)
fixed_size_expr_evaluator q12 232.23 249.54 -17.32 (-6.94%)
fixed_size_expr_evaluator q13 1469.19 1492.51 -23.32 (-1.56%)
fixed_size_seq_scan q23 114.29 133.95 -19.65 (-14.67%)
join q29 676.59 675.68 0.91 (0.13%)
join q30 1461.98 1466.46 -4.48 (-0.31%)
join q31 8.06 10.20 -2.14 (-20.95%)
ldbc_snb_ic q35 369.41 391.26 -21.85 (-5.58%)
ldbc_snb_ic q36 39.40 37.72 1.69 (4.47%)
ldbc_snb_is q32 9.13 8.43 0.70 (8.35%)
ldbc_snb_is q33 12.22 14.23 -2.02 (-14.16%)
ldbc_snb_is q34 4.01 1.24 2.77 (223.52%)
multi-rel multi-rel-large-scan 1580.45 1867.57 -287.12 (-15.37%)
multi-rel multi-rel-lookup 56.94 50.49 6.45 (12.78%)
multi-rel multi-rel-small-scan 112.96 87.30 25.66 (29.40%)
order_by q25 140.76 150.75 -9.99 (-6.63%)
order_by q26 456.26 478.20 -21.94 (-4.59%)
order_by q27 1470.74 1500.78 -30.04 (-2.00%)
scan_after_filter q01 172.63 186.10 -13.46 (-7.23%)
scan_after_filter q02 156.70 174.11 -17.40 (-10.00%)
shortest_path_ldbc100 q37 3715.66 3378.53 337.13 (9.98%)
shortest_path_ldbc100 q38 57.72 61.45 -3.72 (-6.06%)
shortest_path_ldbc100 q39 48.18 46.99 1.19 (2.53%)
shortest_path_ldbc100 q40 66.48 66.72 -0.24 (-0.36%)
var_size_expr_evaluator q03 2070.77 2089.84 -19.07 (-0.91%)
var_size_expr_evaluator q04 2224.76 2291.97 -67.21 (-2.93%)
var_size_expr_evaluator q05 2627.34 2685.68 -58.34 (-2.17%)
var_size_expr_evaluator q06 1332.87 1335.74 -2.87 (-0.22%)
var_size_seq_scan q19 1465.09 1480.31 -15.22 (-1.03%)
var_size_seq_scan q20 2687.47 2675.48 11.99 (0.45%)
var_size_seq_scan q21 2284.81 2288.20 -3.40 (-0.15%)
var_size_seq_scan q22 127.45 132.52 -5.06 (-3.82%)

src/function/vector_string_functions.cpp Outdated Show resolved Hide resolved
src/function/vector_string_functions.cpp Show resolved Hide resolved
Copy link

Benchmark Result

Master commit hash: b3bfc6e378a8c446ee896074235f8e46c9a035e9
Branch commit hash: 39bee653e281eb61a27dec5b58313dc617edbd28

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 659.14 658.39 0.75 (0.11%)
aggregation q28 12010.15 12695.67 -685.52 (-5.40%)
filter q14 142.21 141.72 0.49 (0.35%)
filter q15 141.34 146.03 -4.69 (-3.21%)
filter q16 315.55 322.41 -6.85 (-2.13%)
filter q17 462.07 463.40 -1.33 (-0.29%)
filter q18 1983.86 1956.88 26.98 (1.38%)
fixed_size_expr_evaluator q07 561.24 562.35 -1.10 (-0.20%)
fixed_size_expr_evaluator q08 777.36 772.71 4.65 (0.60%)
fixed_size_expr_evaluator q09 774.51 778.98 -4.47 (-0.57%)
fixed_size_expr_evaluator q10 257.35 254.97 2.38 (0.93%)
fixed_size_expr_evaluator q11 252.81 249.08 3.73 (1.50%)
fixed_size_expr_evaluator q12 251.77 248.16 3.61 (1.45%)
fixed_size_expr_evaluator q13 1490.81 1485.42 5.39 (0.36%)
fixed_size_seq_scan q23 135.78 132.48 3.30 (2.49%)
join q29 680.33 651.99 28.33 (4.35%)
join q30 1385.36 1487.00 -101.64 (-6.84%)
join q31 8.92 9.82 -0.90 (-9.14%)
ldbc_snb_ic q35 384.89 399.29 -14.39 (-3.60%)
ldbc_snb_ic q36 38.82 38.57 0.25 (0.66%)
ldbc_snb_is q32 7.75 6.33 1.42 (22.45%)
ldbc_snb_is q33 18.06 17.75 0.31 (1.76%)
ldbc_snb_is q34 5.98 6.00 -0.02 (-0.33%)
multi-rel multi-rel-large-scan 1604.30 1655.51 -51.21 (-3.09%)
multi-rel multi-rel-lookup 41.83 68.09 -26.25 (-38.56%)
multi-rel multi-rel-small-scan 68.45 72.22 -3.77 (-5.22%)
order_by q25 150.01 145.13 4.88 (3.36%)
order_by q26 467.09 492.60 -25.51 (-5.18%)
order_by q27 1505.12 1465.20 39.92 (2.72%)
scan_after_filter q01 188.08 185.38 2.70 (1.46%)
scan_after_filter q02 174.43 175.27 -0.84 (-0.48%)
shortest_path_ldbc100 q37 3420.95 3752.37 -331.41 (-8.83%)
shortest_path_ldbc100 q38 58.57 55.74 2.83 (5.08%)
shortest_path_ldbc100 q39 53.89 54.64 -0.75 (-1.38%)
shortest_path_ldbc100 q40 65.35 60.59 4.77 (7.87%)
var_size_expr_evaluator q03 2122.85 2082.64 40.21 (1.93%)
var_size_expr_evaluator q04 2316.88 2271.93 44.95 (1.98%)
var_size_expr_evaluator q05 2635.29 2728.49 -93.20 (-3.42%)
var_size_expr_evaluator q06 1366.84 1333.91 32.93 (2.47%)
var_size_seq_scan q19 1497.57 1461.49 36.09 (2.47%)
var_size_seq_scan q20 2470.07 2484.67 -14.59 (-0.59%)
var_size_seq_scan q21 2285.91 2283.08 2.83 (0.12%)
var_size_seq_scan q22 134.47 132.03 2.44 (1.85%)

@acquamarin acquamarin merged commit a181120 into master Oct 17, 2024
@acquamarin acquamarin deleted the regex-function branch October 17, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Add "global" option to regexp_replace
2 participants