Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Place applications in unobserved lattice in an "initializing" state #346

Open
brooksmtownsend opened this issue Jul 24, 2024 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@brooksmtownsend
Copy link
Member

Often when restarting wadm and wasmCloud hosts, you may run a query like wash app list to see what applications are deployed. Wadm, however, will perform no work / status updates until a lattice is actually observed, which begins when a host_started or host_heartbeat is received for a particular lattice.

This is largely out of efficiency concerns, to avoid wadm from observing every single lattice simultaneously. However, it results in inaccurate status for max 30 seconds when a host started event is missed, or indefinitely if no hosts are running in that lattice.

To get around this issue, I'd like to propose that applications are placed in some kind of "initializing" or otherwise indicating "failed" state when a lattice is unobserved. We need to return this information to the user that the application isn't actually deployed yet, rather it's waiting for a host to schedule on.

I could see an argument for Failed since there are no hosts available, I could also see the argument for Reconciling with a status message of "waiting for hosts to schedule on, 0 hosts available" or something like that. Otherwise, we could introduce a new status called Waiting or Initializing (etc) that indicates that we haven't even gotten to reconciling yet.

@brooksmtownsend brooksmtownsend added the help wanted Extra attention is needed label Jul 24, 2024
@brooksmtownsend
Copy link
Member Author

This is also a problem when wadm shuts down, then wasmcloud hosts shut down, then wadm starts back up again. It can take up to 60 seconds to decay the old hosts out of the lattice, and thus rebalance the applications.

@lxfontes
Copy link
Member

👍 and feels that "lattice is unobserved" is the 🔑 point.
Leaning towards Waiting / Scheduling / Unknown as Initializing might give the impression the lattice is reporting and doing pre-flight work.

@lachieh
Copy link
Contributor

lachieh commented Jul 24, 2024

Agreed that new status is beneficial. Also agreed that Initializing is not quite right because the status may re-occur after a successful deployment. Leaning towards Waiting combined with a status reason.

Copy link

stale bot commented Sep 22, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!

@stale stale bot added the stale label Sep 22, 2024
@brooksmtownsend
Copy link
Member Author

This might not be a wholly useful status in an alternative case where we instruct wadm to monitor a specific set of lattices, but TBD there. Bumping with comment for the stalebot, but if we get back to this by the next stale marker we may want to drop this isssue

@stale stale bot removed the stale label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants