Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: node starts without checking internal hostname #12112

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

edward-swirldslabs
Copy link
Contributor

@edward-swirldslabs edward-swirldslabs commented Mar 13, 2024

Must not merge until https://github.com/swirlds/swirlds-platform-regression/pull/4032 is merged

Description:
This PR removes the use of the internal ip address for determining which nodes to start. The nodes to start locally must be specified on the command line or through environment variable.

  • If no nodes are specified:
    • Browser starts all nodes locally.
    • ServicesMain logs an error and exits.
  • If multiple nodes are specified
    • Browser starts just the nodes specified locally
    • ServicesMain exits with an error indicating exactly 1 node must be started.
  • If specified, the commandline arguments take precedence over environment variables.

The methods for loading the cryptography now take parameters indicating which nodes will be started locally.

CLI: comma separated list at end of java command.

java <jar> -local 0,1,2

Environment Variable: comma separated list for the nodesToRun environment variable.

nodesToRun=0,1,2

!!!! This PR impacts all deployments of nodes, from JRS, to SOLO, to DevOps managed test and production networks.

  • To start a single node, provide a single node id through commandline or environment variable.

I've removed ServicesMainTest.java as it only tested the behavior of mismatched addresses. The logic has significantly changed. These unit tests were actually misbehaving. legacyConfigProperties was returning a null address book instead of an empty address book. All code paths are used in testing and production and will fail critically if the code paths are not working as expected. Maintaining complicated static mocked scenarios is not necessary.

  1. HAPI tests specify nodes to run through the command line.
  2. SOLO uses environment variables to configure nodes to run.
  3. The Browser runs all nodes locally.
  4. ServicesMain later ensures there is exactly 1 node running.

Related issue(s):

Fixes #11751

Testing

This was manually tested locally with the Browser and verified to work with both CLI and ENV methods. The same logic is at play in ServicesMain. The proof is when the code is executed in testing environments and it works.

Checklist

Possible Impacts?

  • NMT
  • JRS
  • Solo / FST
  • HAPI tests
  • Local Node Testing
  • Gradle Build
  • Perfnet Testing

Sign Offs

  • Services
  • Testing
  • Release Engineering
  • DevOps

@edward-swirldslabs edward-swirldslabs added this to the v0.49 milestone Mar 13, 2024
@edward-swirldslabs edward-swirldslabs self-assigned this Mar 13, 2024
@edward-swirldslabs edward-swirldslabs requested a review from a team as a code owner March 13, 2024 19:59
@edward-swirldslabs edward-swirldslabs requested a review from a team March 13, 2024 19:59
@edward-swirldslabs edward-swirldslabs requested a review from a team as a code owner March 13, 2024 19:59
Signed-off-by: Edward Wertz <[email protected]>
Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Restart) Results

2 tests   2 ✔️  5m 24s ⏱️
2 suites  0 💤
2 files    0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Token) Results

190 tests   190 ✔️  19m 7s ⏱️
  14 suites      0 💤
  14 files        0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Crypto) Results

229 tests   229 ✔️  28m 35s ⏱️
  23 suites      0 💤
  23 files        0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Misc) Results

444 tests   434 ✔️  38m 58s ⏱️
  77 suites    10 💤
  77 files        0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Time Consuming) Results

21 tests   21 ✔️  53m 32s ⏱️
  3 suites    0 💤
  3 files      0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: HAPI Test (Smart Contract) Results

494 tests   491 ✔️  1h 3m 10s ⏱️
  55 suites      3 💤
  55 files        0

Results for commit a2a49cc.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Mar 13, 2024

Node: Unit Test Results

    2 266 files  +1             1 errors  2 265 suites  +1   3h 51m 16s ⏱️ + 1h 30m 54s
111 102 tests +5  111 031 ✔️ +5  71 💤 ±0  0 ±0 
119 544 runs  +5  119 473 ✔️ +5  71 💤 ±0  0 ±0 

For more details on these parsing errors, see this check.

Results for commit a2a49cc. ± Comparison against base commit 3d09dbf.

♻️ This comment has been updated with latest results.

Copy link

Node: HAPI Test (Node Death Reconnect) Results

2 tests   2 ✔️  7m 28s ⏱️
2 suites  0 💤
2 files    0

Results for commit a2a49cc.

@edward-swirldslabs
Copy link
Contributor Author

edward-swirldslabs commented Mar 14, 2024

Lots of work to do before this can be merged. This PR removes a capability without introducing the capability's replacement. JRS and other tools such as NMT will need to be re-worked to support either (1) command line identification of which node to start or (2) setting of an environment variable which indicates which node to start.

Copy link
Contributor

@cody-littley cody-littley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although this should not be merged until all downstream dependents give the thumbs up.

lpetrovic05
lpetrovic05 previously approved these changes Dec 26, 2024
timo0
timo0 previously approved these changes Dec 26, 2024
kimbor
kimbor previously approved these changes Dec 27, 2024
@jeromy-cannon
Copy link
Contributor

@edward-swirldslabs , this is confusing:
CLI: comma separated list at end of java command.

java <jar> -local 0,1,2

Environment Variable: comma separated list for the nodesToRun environment variable.

nodesToRun=0,1,2

It looks like the code and expectation is that you only supply a single node ID, but your example here shows multiple.

@edward-swirldslabs
Copy link
Contributor Author

@edward-swirldslabs , this is confusing: CLI: comma separated list at end of java command.

java <jar> -local 0,1,2

Environment Variable: comma separated list for the nodesToRun environment variable.

nodesToRun=0,1,2

It looks like the code and expectation is that you only supply a single node ID, but your example here shows multiple.

I have updated the PR text to clarify that ServicesMain can only start 1 node per process. Browser can start multiple nodes per process.

@edward-swirldslabs edward-swirldslabs modified the milestones: v0.58, v0.59 Dec 30, 2024
@nathanklick nathanklick dismissed stale reviews from kimbor, timo0, and lpetrovic05 via 0fb9830 December 31, 2024 20:24
@nathanklick nathanklick requested review from a team as code owners December 31, 2024 20:24
@nathanklick nathanklick requested review from andrewb1269hg and removed request for litt3 and cody-littley December 31, 2024 20:24
Copy link

codacy-production bot commented Dec 31, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.06% (target: -1.00%) 19.12%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (0d78221) 95934 65324 68.09%
Head commit (ccc462a) 95909 (-25) 65249 (-75) 68.03% (-0.06%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#12112) 68 13 19.12%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@nathanklick nathanklick added Feature Enhancement Enhancing an existing feature driven by business requirements. Typically backwards compatible. P0 An issue impacting production environments or impacting multiple releases or multiple individuals. labels Jan 3, 2025
@@ -100,6 +100,10 @@ if [[ "${JCP_OVERRIDDEN}" != true && "${JAVA_MAIN_CLASS}" == "com.swirlds.platfo
JAVA_CLASS_PATH="data/lib/*"
fi

# Setup Consensus Node Arguments
CONSENSUS_NODE_ARGS=""
[[ -n "${CONSENSUS_NODE_ID}" && "${CONSENSUS_NODE_ID}" -ge 0 ]] && CONSENSUS_NODE_ARGS="-local ${CONSENSUS_NODE_ID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is CONSENSUS_NODE_ID is not specified or less than 0 should this be treated as an error condition and not startup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalvizu It ultimately will be at the NMT level. We could add a failure condition here; however, these images are also used for older software releases which do not have the -local option.

We made it optional here for backwards compatibility with NMT 1.2.9 and 0.58.x releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, the ServicesMain entrypoint should fail fast if the required nodeId is not specified.

@edward-swirldslabs Does the ServicesMain entrypoint fail if no environment variable and no -local switch is supplied?

Copy link
Contributor Author

@edward-swirldslabs edward-swirldslabs Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanklick Yes. The ServicesMain call to ensureSingleNode makes sure there is no more and no less than 1 node being started.

If no -local switch is supplied, then we're relying on the environment variable to determine the node to start. If no environment variable is set, then the node will fail to start (from ServicesMain) because its singular identity was not specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalvizu Does this satisfy your concerns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Enhancement Enhancing an existing feature driven by business requirements. Typically backwards compatible. P0 An issue impacting production environments or impacting multiple releases or multiple individuals.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use commandline or environment variable to determine node ids to start locally.