Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Only send needed data to task runner (no-changelog) #11487

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

tomi
Copy link
Contributor

@tomi tomi commented Oct 31, 2024

Summary

When executing JS code in the Code Node, the task runner has currently been fetching the entire workflow execution context data. This can be a lot of data, and can cause OOMs on large workflows. Most often Code Node is used in such a way that it only uses the input data and maaaaybe some other node's data. Hence sending all the data is an overkill.

This PR changes the behaviour to send only the needed data. This is implemented by running a static analysis of the code and identifying which built-in variables are accessed within the code. Based on that analysis we only send the needed data.

There can be corner cases where we can't statically analyse for example which nodes' data is needed. This can happen for example when a variable is used as parameter to $() function. In these cases we send all the data.

Based on some naive measurements, the message size is reduced something between 30-70%.

Next steps:
After this change there is room for optimization. For example, there is still duplication in the message send to the task runner. The input data of the Code Node is located in multiple parts of the message, which all end up being separate objects when deserialized in the task runner.

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/PAY-2174/send-only-needed-data-to-the-runner

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

When executing JS code in the Code Node, the task runner has currently
been fetching the entire workflow execution context data. This can
be a lot of data, and can cause OOMs on large workflows. Most often
Code Node is used in such a way that it only uses the input data and
maaaaybe some other node's data. Hence sending all the data is an
overkill.

This PR changes the behaviour to send only the needed data. This is
implemented by running a static analysis of the code and identifying
which built-in variables are accessed within the code. Based on that
analysis we only send the needed data.

There can be corner cases where we can't statically analyse for
example which nodes' data is needed. This can happen for example
when a variable is used as parameter to $() function. In these
cases we send all the data.

Based on some naive measurements, the message size is reduced
something between 30-70%.

Next steps:
After this change there is room for optimization. For example,
there is still duplication in the message send to the task runner.
The input data of the Code Node is located in multiple parts of
the message, which all end up being separate objects when
deserialized in the task runner.
@tomi tomi changed the title feat: Only send needed data to task runner feat: Only send needed data to task runner (no-changelog) Oct 31, 2024
@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Oct 31, 2024
Copy link
Contributor

@ivov ivov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! 🔥

Will come back later to test it manually.

Comment on lines 96 to 98
if (accessedProperty.value === 'item') {
state.markNeedsAllNodes();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also pairedItem and itemMatching need to trace back the chain.

if (['pairedItem', 'itemMatching', 'item'].includes(property as string)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will add that

}

/**
* Assuming the given `obj` is an object where the keys are node names,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something break if the string is not a valid node name? e.g. user typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In that case there is no node with that name and filtering works as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants