-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Only send needed data to task runner (no-changelog) #11487
base: master
Are you sure you want to change the base?
Conversation
When executing JS code in the Code Node, the task runner has currently been fetching the entire workflow execution context data. This can be a lot of data, and can cause OOMs on large workflows. Most often Code Node is used in such a way that it only uses the input data and maaaaybe some other node's data. Hence sending all the data is an overkill. This PR changes the behaviour to send only the needed data. This is implemented by running a static analysis of the code and identifying which built-in variables are accessed within the code. Based on that analysis we only send the needed data. There can be corner cases where we can't statically analyse for example which nodes' data is needed. This can happen for example when a variable is used as parameter to $() function. In these cases we send all the data. Based on some naive measurements, the message size is reduced something between 30-70%. Next steps: After this change there is room for optimization. For example, there is still duplication in the message send to the task runner. The input data of the Code Node is located in multiple parts of the message, which all end up being separate objects when deserialized in the task runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! 🔥
Will come back later to test it manually.
if (accessedProperty.value === 'item') { | ||
state.markNeedsAllNodes(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think also pairedItem
and itemMatching
need to trace back the chain.
n8n/packages/workflow/src/WorkflowDataProxy.ts
Line 1027 in 643d66c
if (['pairedItem', 'itemMatching', 'item'].includes(property as string)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will add that
packages/@n8n/task-runner/src/js-task-runner/built-ins-parser/built-ins-parser.ts
Show resolved
Hide resolved
packages/cli/src/runners/task-managers/data-request-response-builder.ts
Outdated
Show resolved
Hide resolved
packages/cli/src/runners/task-managers/data-request-response-builder.ts
Outdated
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Assuming the given `obj` is an object where the keys are node names, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something break if the string is not a valid node name? e.g. user typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. In that case there is no node with that name and filtering works as expected
Summary
When executing JS code in the Code Node, the task runner has currently been fetching the entire workflow execution context data. This can be a lot of data, and can cause OOMs on large workflows. Most often Code Node is used in such a way that it only uses the input data and maaaaybe some other node's data. Hence sending all the data is an overkill.
This PR changes the behaviour to send only the needed data. This is implemented by running a static analysis of the code and identifying which built-in variables are accessed within the code. Based on that analysis we only send the needed data.
There can be corner cases where we can't statically analyse for example which nodes' data is needed. This can happen for example when a variable is used as parameter to $() function. In these cases we send all the data.
Based on some naive measurements, the message size is reduced something between 30-70%.
Next steps:
After this change there is room for optimization. For example, there is still duplication in the message send to the task runner. The input data of the Code Node is located in multiple parts of the message, which all end up being separate objects when deserialized in the task runner.
Related Linear tickets, Github issues, and Community forum posts
https://linear.app/n8n/issue/PAY-2174/send-only-needed-data-to-the-runner
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)