You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since we add timestamps to all messages, I'd opt for at least once processing to start, we can always establish unicity by the timestamp. It may be harder to retrieve lost data.
We would use group of consumer to maximise the throughput. Groups implies that Kafka periodically and automatically re-assigns partitions across consumers inside a single group for balancing the work-load. This results in a new generation of the group. After the rebalance, Consumer_X may manage partitions previously handled by consumer_Y. The last seen timestamp per each key cannot be stored locally (e.g. inside each consumer), this information has to be stored in a data-structure shared among all group members. A NonBlockingHashMap could solve this issue.
Kafka messages can be consumed according to two different strategies:
This design choice drives the consumer implementation.
We may plan to code both approaches, and select the most suitable after tests.
EDIT @blootsvoets: reversed at most and at least
The text was updated successfully, but these errors were encountered: