feat: Only send needed data to task runner (no-changelog) #11487

tomi · 2024-10-31T13:05:54Z

Summary

When executing JS code in the Code Node, the task runner has currently been fetching the entire workflow execution context data. This can be a lot of data, and can cause OOMs on large workflows. Most often Code Node is used in such a way that it only uses the input data and maaaaybe some other node's data. Hence sending all the data is an overkill.

This PR changes the behaviour to send only the needed data. This is implemented by running a static analysis of the code and identifying which built-in variables are accessed within the code. Based on that analysis we only send the needed data.

There can be corner cases where we can't statically analyse for example which nodes' data is needed. This can happen for example when a variable is used as parameter to $() function. In these cases we send all the data.

Based on some naive measurements, the message size is reduced something between 30-70%.

Next steps:
After this change there is room for optimization. For example, there is still duplication in the message send to the task runner. The input data of the Code Node is located in multiple parts of the message, which all end up being separate objects when deserialized in the task runner.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/PAY-2174/send-only-needed-data-to-the-runner

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

When executing JS code in the Code Node, the task runner has currently been fetching the entire workflow execution context data. This can be a lot of data, and can cause OOMs on large workflows. Most often Code Node is used in such a way that it only uses the input data and maaaaybe some other node's data. Hence sending all the data is an overkill. This PR changes the behaviour to send only the needed data. This is implemented by running a static analysis of the code and identifying which built-in variables are accessed within the code. Based on that analysis we only send the needed data. There can be corner cases where we can't statically analyse for example which nodes' data is needed. This can happen for example when a variable is used as parameter to $() function. In these cases we send all the data. Based on some naive measurements, the message size is reduced something between 30-70%. Next steps: After this change there is room for optimization. For example, there is still duplication in the message send to the task runner. The input data of the Code Node is located in multiple parts of the message, which all end up being separate objects when deserialized in the task runner.

ivov

Amazing work! 🔥

Will come back later to test it manually.

ivov · 2024-11-01T13:01:01Z

packages/@n8n/task-runner/src/js-task-runner/built-ins-parser/built-ins-parser.ts

+					if (accessedProperty.value === 'item') {
+						state.markNeedsAllNodes();
+					}


I think also pairedItem and itemMatching need to trace back the chain.

n8n/packages/workflow/src/WorkflowDataProxy.ts

Line 1027 in 643d66c

if (['pairedItem', 'itemMatching', 'item'].includes(property as string)) {

Good catch! Will add that

packages/@n8n/task-runner/src/js-task-runner/built-ins-parser/built-ins-parser.ts

packages/@n8n/task-runner/src/js-task-runner/js-task-runner.ts

packages/cli/src/runners/task-managers/data-request-response-builder.ts

ivov · 2024-11-01T13:20:34Z

packages/cli/src/runners/task-managers/data-request-response-builder.ts

+	}
+
+	/**
+	 * Assuming the given `obj` is an object where the keys are node names,


Would something break if the string is not a valid node name? e.g. user typo

Good question. In that case there is no node with that name and filtering works as expected

packages/cli/src/runners/task-runner-process.ts

packages/@n8n/task-runner/src/js-task-runner/built-ins-parser/built-ins-parser.ts

...n8n/task-runner/src/js-task-runner/built-ins-parser/__tests__/built-ins-parser-state.test.ts

tomi changed the title ~~feat: Only send needed data to task runner~~ feat: Only send needed data to task runner (no-changelog) Oct 31, 2024

tomi added 2 commits October 31, 2024 15:19

Fix parameter

39422d9

Fix test

c1c717c

n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Oct 31, 2024

Remove optimization from $("node").item case

3c797ce

ivov reviewed Nov 1, 2024

View reviewed changes

tomi added 8 commits November 1, 2024 16:38

Make markNeedsAllNodes imply also input data is needed

13d2eae

Handle .pairedItem() and .itemMatching() properly

d75cbd8

Add test that breaks if new properties are added to data proxy

7456880

Document why we can't throw on missing execution data

d69c339

Add documentation

a11f71e

Fix test name

213c8f1

Mark methods as private

19c17b9

Fix test

c11b0b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Only send needed data to task runner (no-changelog) #11487

feat: Only send needed data to task runner (no-changelog) #11487

tomi commented Oct 31, 2024

ivov left a comment

ivov Nov 1, 2024

tomi Nov 1, 2024

ivov Nov 1, 2024

tomi Nov 1, 2024

feat: Only send needed data to task runner (no-changelog) #11487

Are you sure you want to change the base?

feat: Only send needed data to task runner (no-changelog) #11487

Conversation

tomi commented Oct 31, 2024

Summary

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

ivov left a comment

Choose a reason for hiding this comment

ivov Nov 1, 2024

Choose a reason for hiding this comment

tomi Nov 1, 2024

Choose a reason for hiding this comment

ivov Nov 1, 2024

Choose a reason for hiding this comment

tomi Nov 1, 2024

Choose a reason for hiding this comment