Extend workflowcheck tool to detect reads from (WorkflowExecution).RunId and suggest alternatives. #1385

askreet · 2024-02-09T20:56:42Z

Is your feature request related to a problem? Please describe.

I've recently learned that using the value of RunId within workflow task code is non-deterministic under workflow ~~retry~~ reset scenarios.

Describe the solution you'd like

I'd love for the workflowcheck tool to detect reads of this particular property within workflow functions and suggest use of the WorkflowInfo.OriginalRunId (or FirstRunId) field instead.

Describe alternatives you've considered

N/A.

Additional context

N/A.

Quinn-With-Two-Ns · 2024-02-09T20:58:24Z

Can you clarify the scenario? Workflow retry starts a whole new workflow so the fact that RunID changes should not lead to non determinism

askreet · 2024-02-09T21:35:24Z

@Quinn-With-Two-Ns It leads to non-determinism if you use the value of RunID as an input to an activity or child workflow. For example, we were rendering the URL to the Temporal UI before kicking off a child workflow that creates a pull request, linking our developers back to the workflow that was created for them. In my scenario, the workflow had previously passed this step successfully so during replay the input value to the child workflow had changed.

Edit: In fact this journey led me this field, which documents itself as a more viable alternative.

Quinn-With-Two-Ns · 2024-02-09T21:43:45Z

Activities and Child workflows are not carried over retries. Non determinism only applies to a single workflow execution. A retried workflow can take a totally different path then the parent. It sounds like your trying to use RunID as an idempotency token across workflow retries, which is an error in your code, but is not non determinism.

The workflow check is designed to catch non determinism errors not all possible programming errors,

Quinn-With-Two-Ns · 2024-02-09T21:44:28Z

Note: in general we do not recommend workflow retries, we recommend handling errors in your workflow.

askreet · 2024-02-09T22:00:20Z

Activities and Child workflows are not carried over retries.

I don't follow what you're saying here.

The input values need to match during the execution of a workflow task, or you get a non-determinism error when you reach the point in the workflow task when it attempts to run the activity or child workflow, right? In my workflow I execute a child workflow at a point in the event stream, and during the retry the input value is not the same as it was during the previous workflow task because it refers to a Run ID, which has changed due to the retry.

To be clear, I don't expect (or desire) that the child workflow is actually run again - we're talking about replaying the event stream to determine the next action for the workflow to take.

Note: in general we do not recommend workflow retries, we recommend handling errors in your workflow.

In my case I had a workflow fail near the end of it's run due to a program defect. I've corrected the defect and would like to replay the workflow from the point of failure. Are you suggesting that the workflow retry command is not intended for me to do that? What alternative methods are suggested for resuming my terminated, failed workflow execution so that I can complete the intended actions?

Edit: Wait, I now see the confusion in the second scenario - I've been saying "retry", I mean workflow "reset". That's what I'm trying to do 😓. I am not using workflow retries at all.

Quinn-With-Two-Ns · 2024-02-09T22:04:00Z

Ah OK, reset makes more sense

The input values need to match during the execution of a workflow task, or you get a non-determinism error when you reach the point in the workflow task when it attempts to run the activity or child workflow, right?

No, input is not checked when the SDK is validating a workflow is deterministic.

askreet · 2024-02-09T22:05:18Z

No, input is not checked when the SDK is validating a workflow is deterministic.

Oh? If that's so that I actually am back at square one and don't understand this determinism error! Will keep digging.

Quinn-With-Two-Ns · 2024-02-09T22:08:58Z

Only the Activity ID and Type is checked I believe

askreet · 2024-02-09T22:16:58Z

@Quinn-With-Two-Ns You're definitely pointing me in the right direction here, I updated my code to use OriginalRunId and I'm still seeing this error:

unknown command CommandType: ChildWorkflow, ID: 70b7166d-0175-40bb-9ec0-e0f6473625c1_29, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition

That ID is the original Run ID, plus the Event ID where the child was spawned. My current thesis is that since I'm not setting a workflow ID on the child workflow execution, it's comparing a newly generated child workflow ID that does not match that value.

askreet · 2024-02-09T22:35:42Z

I think the interesting piece of code is: https://github.com/temporalio/sdk-go/blob/master/internal/internal_event_handlers.go#L538-L540.

func (wc *workflowEnvironmentImpl) ExecuteChildWorkflow(
	params ExecuteWorkflowParams, callback ResultHandler, startedHandler func(r WorkflowExecution, e error),
) {
	if params.WorkflowID == "" {
		params.WorkflowID = wc.workflowInfo.WorkflowExecution.RunID + "_" + wc.GenerateSequenceID()
	}

If I understand this correctly, any child workflow execution without a deterministically set Workflow ID will make the parent execution not "reset safe".

Quinn-With-Two-Ns · 2024-02-09T23:09:23Z

Ah if your using child workflow without an ID then this is a know issue #723.

I would recommend setting your own ID for now since as the linked issue says the SDK generated ID needs to not conflict with future server work that is not currently prioritized.

askreet · 2024-02-09T23:12:31Z

Awesome, thanks @Quinn-With-Two-Ns! I'll update our internal guidelines to always set child workflow IDs in the meantime. I wonder if that is a useful check for workflowcheck in the meantime if we don't have a timeline for the blocking server work? Anyway, I'll close this out for now.

askreet added the enhancement New feature or request label Feb 9, 2024

askreet closed this as completed Feb 9, 2024

askreet mentioned this issue Feb 9, 2024

Unable to reset workflow with completed childworkflow whose child workflowID is generated by SDK #723

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend workflowcheck tool to detect reads from (WorkflowExecution).RunId and suggest alternatives. #1385

Extend workflowcheck tool to detect reads from (WorkflowExecution).RunId and suggest alternatives. #1385

askreet commented Feb 9, 2024 •

edited

Loading

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024 •

edited

Loading

Quinn-With-Two-Ns commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024 •

edited

Loading

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

askreet commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

Extend workflowcheck tool to detect reads from (WorkflowExecution).RunId and suggest alternatives. #1385

Extend workflowcheck tool to detect reads from (WorkflowExecution).RunId and suggest alternatives. #1385

Comments

askreet commented Feb 9, 2024 • edited Loading

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024 • edited Loading

Quinn-With-Two-Ns commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024 • edited Loading

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

askreet commented Feb 9, 2024

Quinn-With-Two-Ns commented Feb 9, 2024

askreet commented Feb 9, 2024

askreet commented Feb 9, 2024 •

edited

Loading

askreet commented Feb 9, 2024 •

edited

Loading

askreet commented Feb 9, 2024 •

edited

Loading