-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DBZ] Read offset from Kafka for every commit callback #348
Merged
vaibhav-yb
merged 22 commits into
yugabyte:main
from
vaibhav-yb:override-commit-callback
Aug 30, 2024
Merged
[DBZ] Read offset from Kafka for every commit callback #348
vaibhav-yb
merged 22 commits into
yugabyte:main
from
vaibhav-yb:override-commit-callback
Aug 30, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vaibhav-yb
commented
Aug 15, 2024
src/test/java/io/debezium/connector/yugabytedb/YugabyteDBSnapshotTest.java
Outdated
Show resolved
Hide resolved
vaibhav-yb
commented
Aug 22, 2024
@@ -940,7 +940,7 @@ public void commitOffset(Map<String, ?> offset) { | |||
// than one already present would throw the error: CDCSDK: Trying to fetch already GCed intents | |||
if (this.tabletToExplicitCheckpoint.get(entry.getKey()) != null && | |||
tempOpId.getIndex() < this.tabletToExplicitCheckpoint.get(entry.getKey()).getIndex()) { | |||
LOGGER.warn("The received OpId {} is less than the older checkpoint {} for tablet {}", | |||
LOGGER.debug("The received OpId {} is less than the older checkpoint {} for tablet {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is converted to debug
level considering the following scenario:
- Last record published by the connector has OpId
1.5
- Connector received a callback on the above OpId so explicit checkpoint is also at
1.5
- Now there were couple of NO_OP on service so connector received empty batches and explicit checkpoint was advanced to
1.8
- Assume that for sometime there are no records, or there are no records after the record in step 1
- This will result in a commit callback with checkpoint
1.5
since that was the last published record - Subsequently, we will always end up printing this warning log which could be confusing to users.
suranjan
approved these changes
Aug 26, 2024
src/main/java/io/debezium/connector/yugabytedb/YugabyteDBConnectorTask.java
Show resolved
Hide resolved
src/main/java/io/debezium/connector/yugabytedb/YugabyteDBConnectorTask.java
Show resolved
Hide resolved
suranjan
requested changes
Aug 28, 2024
src/main/java/io/debezium/connector/yugabytedb/YugabyteDBChangeEventSourceCoordinator.java
Outdated
Show resolved
Hide resolved
src/main/java/io/debezium/connector/yugabytedb/YugabyteDBConnectorTask.java
Outdated
Show resolved
Hide resolved
src/main/java/io/debezium/connector/yugabytedb/YugabyteDBConnectorTask.java
Outdated
Show resolved
Hide resolved
suranjan
approved these changes
Aug 28, 2024
This was referenced Sep 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: This PR breaks explicit checkpointing in the tablet splitting case, it will be fixed in a follow-up PR.
Problem
There are two issues with the current checkpointing mechanism using the callbacks:
They both are potential cause of data loss and have been reproduced manually as well.
DBZ-6026 -
The current code uses a method BaseSourceTask#logStatistics to log some information, however, it also ends up updating the offset map which is then used for a callback.
However, with this, there's a possibility that if there's a
commit()
callback after the statistics are logged and before the records are returned from the BaseSourceTask#poll, it will end up marking the checkpoint on service but if the connector restarts in that window, there will be a data loss.Steps to reproduce:
logStatistics()
is called and before records are returned from BaseSourceTask#poll.a. Add a log so that we see when the method is getting called.
DBZ-7816 -
According to Kafka docs, callbacks for the same Kafka partition are guaranteed to be in order but callbacks for different Kafka partitions can come out of order which can lead to a potential data loss window as mentioned in DBZ-7816
Steps to reproduce:
snapshot.mode=initial
Note that out of all the experiments performed for this issue, we were able to reproduce it 100% of the times but there's a possibility that it might not reproduce if callbacks are in order (1 out of 10 times maybe?)
Solution
This PR includes the changes to override the
commit()
method inYugabyteDBConnectorTask
and reads the offsets from Kafka partitions and uses the same offsets to send via commit callbacks so that they can be marked on service for checkpointing.