Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Subtask]: Implement DEDUP JOIN operator #17540

Closed
aunjgr opened this issue Jul 15, 2024 · 0 comments
Closed

[Subtask]: Implement DEDUP JOIN operator #17540

aunjgr opened this issue Jul 15, 2024 · 0 comments
Assignees
Milestone

Comments

@aunjgr
Copy link
Contributor

aunjgr commented Jul 15, 2024

Parent Issue

#17500

Detail of Subtask

To detect duplicate rows from data to be inserted, we will implement a DEDUP JOIN operator, which supports 3 policies on duplicate: error/ignore/update.

To help debugging, one or more new SQL keyword should also be added. The easiest way is to support queries like "select tNew.* from tNew DEDUP JOIN tOld on tNew.pk = tOld.pk".

Describe implementation you've considered

Always build on new data and probe on old data.

  • on duplicate error: report error on duplicated rows
  • on duplicate ignore: drop the duplicated rows
  • on duplicate update: for each duplicated row update an existed row accordingly

Additional information

https://github.com/matrixorigin/docs/pull/299

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants