Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra replay command when replaying "partial histories" #1670

Open
RamyElkest opened this issue Oct 14, 2024 · 3 comments
Open

Extra replay command when replaying "partial histories" #1670

RamyElkest opened this issue Oct 14, 2024 · 3 comments

Comments

@RamyElkest
Copy link

This is more of a request for information than a bug.

Expected Behavior

Replaying a downloaded workflow history ending with workflow task (scheduled/started) should not fail with a [TMPRL1100] nondeterministic workflow: extra replay command

Actual Behavior

Replaying a downloaded workflow history ending with workflow task (scheduled/started) fails with an [TMPRL1100] nondeterministic workflow: extra replay command

Steps to Reproduce the Problem

Reproducing test and detailed explanation: RamyElkest#1

Specifications

  • Version: v1.26.0
  • Platform: v1.23.1
@RamyElkest RamyElkest changed the title Replaying partial histories Extra replay command when replaying "partial histories" Oct 14, 2024
@RamyElkest
Copy link
Author

Solution
The proposed solution here is to trim scheduled/started/completed workflow tasks with no follow-up events, this guarantees the workflow history is in a safely replayable state. For this there are three approaches:

  1. Trim the history in GetWorkflowHistory (to be discussed with upstream)
  2. Trim the history in our code before passing it to the Replayer
  3. Trim the history in the Replayer (to be discussed with upstream)

Curious if you have any thoughts / preferences here.

@cretz
Copy link
Member

cretz commented Oct 15, 2024

Thanks for the report! Will confer with the team on replaying of mid-task history captures. While it makes sense to only replay up to the last completed or failed task, we may need to double check that people aren't running replays on the active task without the task failure to replicate failures (e.g. to replicate deadlock detection).

@cretz
Copy link
Member

cretz commented Oct 16, 2024

Conferred with team, we consider this a bug. If we are in fact failing a replay with history that should succeed, we need to fix. It is likely we should not be performing history matching for non-determinism checks after the last task start (that doesn't have an end). This issue will be updated when we have a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants