Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fleet: add replace unit support #1509

Merged
merged 12 commits into from
Apr 25, 2016

Conversation

tixxdz
Copy link
Contributor

@tixxdz tixxdz commented Mar 16, 2016

This PR allows units to be replaced with "submit", "load" and "start" commands. Just add the new "--replace" switch.

The previous discussion was about overwrite in this PR #1295

This PR tries to fix: #760

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 16, 2016

We need further testing for this one. thanks!

@antrik
Copy link
Contributor

antrik commented Mar 16, 2016

Regarding the test issue: I don't know whether this is a bug or expected behaviour; but there seems to be no guarantee that the units have reached the final states by the time the command returns. That's why all the existing tests use WaitForNActiveUnits(); and that's also why I created a similar loop in my new connectivity loss test: https://github.com/endocode/fleet/blob/antrik/tests-loss_of_connectivity/functional/connectivity-loss_test.go#L134 (I couldn't use WaitForNActiveUnits() in this case, as not all of the units are supposed to become active -- which I think also applies here, depending on the case?) I guess you need something similar.

@antrik
Copy link
Contributor

antrik commented Mar 16, 2016

According to the commit message of the first commit (and from a quick glance at the code), a command with --replace will only take effect if the unit is already in the respective state. As discussed earlier (out-of-band), I'm pretty sure this is not the desired behaviour. The commands should behave exactly the same as they would without --replace, except that if a matching unit already exists, it is replaced by the new version, rather than retaining the old one. So if a unit is inactive for example, start --replace should submit the new unit file (replacing the previous one), load it, and start it.

Slightly less clear is what to do for example if a unit is already launched, and we do submit --replace. In my understanding reconciliation will automatically reload and restart the updated unit as soon as it's submitted into the registry? That might be slightly unintuitive -- but I guess there is not much we can do about that...

@antrik
Copy link
Contributor

antrik commented Mar 16, 2016

Without actually digging into the code, I see some formal issues with this PR from looking through the commit messages. Some of the commits are just fixing things introduced in earlier commits of the same PR (such as the gofmt fix) -- these should be squashed into the respective commit(s) that introduced the problematic code in the first place.

Also, it looks like the order of the commits is wrong: the commit introducing the --replace switch for example can't possibly work without some of the internal changes introduced by the other commits -- so it should be committed after these changes, not before...

@dongsupark dongsupark force-pushed the tixxdz/fleetctl-replace-unit-v2 branch from f2412d0 to c72428e Compare March 17, 2016 08:52
@tixxdz tixxdz force-pushed the tixxdz/fleetctl-replace-unit-v2 branch from c72428e to 6d38716 Compare March 17, 2016 09:25
@tixxdz tixxdz mentioned this pull request Mar 17, 2016
@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 17, 2016

@antrik yes actually only one commit that gofmt which was fixed, thanks! ok could you please dig then in the code and ping us if we missed something, this is new functionality. On the note on how to replace units, Yes start --replace should overwrite submit and load but not the other way. Later fleet should be enforced to follow up the life cycle and states of units...

@jonboulle we have added functional tests to replace many units, etc, the functionality is there... but to be honest the thing that bothers me is the order of systemd directives... but hey it depends also on the units. Since fleet is declarative, putting StopPost directives that are buggy or do not finish and updating is asking for trouble...

@klausenbusk
Copy link

Slightly less clear is what to do for example if a unit is already launched, and we do submit --replace. In my understanding reconciliation will automatically reload and restart the updated unit as soon as it's submitted into the registry?

I think we should provide 2 things.

  1. A way to replace a unit and restart it.
  2. and a way to replace a unit a reload it (systemctl daemon-reload)

So I think we could use submit/load for the latter and start for the first, is that a solution? or what do you think.

@dongsupark
Copy link
Contributor

@antrik

Regarding the test issue: I don't know whether this is a bug or expected behaviour; but there seems to be no guarantee that the units have reached the final states by the time the command returns. That's why all the existing tests use WaitForNActiveUnits()

Makes sense. So I added such checks to the functional tests. Maybe during the next update, this patch will be included:
endocode#21
endocode@4e4cc94

For now this test seems to work. However, I'm a little careful about adding more tests included in this PR. I'd rather create a new PR afterwards for further comprehensive tests.

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 17, 2016

@antrik

Slightly less clear is what to do for example if a unit is already launched, and we do submit --replace. In my understanding reconciliation will automatically reload and restart the updated unit as soon as it's submitted into the registry? That might be slightly unintuitive -- but I guess there is not much we can do about that...

submit is to submit units into the cluster, so the desired state is submit and incative:
https://coreos.com/fleet/docs/latest/states.html

Now if you add "--replace" into the game, and if I quote you "will automatically reload and restart" here you are mixing submit, load and start desired states. If you want to achieve that then just do "start --replace" why "--submit --replace" ? we should not mix desired states, and we should follow the life cycle of units and services, otherwise we introduce confusion, not to mention that these operations are also handled by systemd...

So here just use the appropriate command.

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 17, 2016

@klausenbusk

I think we should provide 2 things.

A way to replace a unit and restart it.
and a way to replace a unit a reload it (systemctl daemon-reload)
So I think we could use submit/load for the latter and start for the first, is that a solution? or what do you think.

The way to replace a unit and restart it, is what this PR does! we re-execute systemd directives, could you please try it ? edit your unit, then fleetctl start --replace ?

And the second way you mention is also handled and the manager instructs systemd to reload. Now for submit it does not at all involve systemd reload!

Thank you!

@tixxdz tixxdz added this to the v0.12.0 milestone Mar 17, 2016
@tixxdz tixxdz self-assigned this Mar 17, 2016
@antrik
Copy link
Contributor

antrik commented Mar 17, 2016

@klausenbusk AIUI fleet doesn't presently have any notion of reloading a unit? While IIRC there have been requests for that, I don't think we should try to address this question here. This PR is quite complicated as it is...

@klausenbusk
Copy link

And the second way you mention is also handled and the manager instructs systemd to reload. Now for submit it does not at all involve systemd reload!

I want a way I can replace a unit without restarting it! But as @antrik said, that should be another pr.
Use-case: I want to replace my Mariadb Galera unit file. Currently I need to do:

  1. fleectl destroy mariadb-galera
  2. fleetctl load mariadb-galera
  3. Bootstrap mariadb-galera manually on one of the nodes (docker run...).
  4. Start mariadb with systemctl start mariadb-galera on node 2
  5. Wait for it to finish SST and do the same on node 3.
  6. Stop the manually started mariadb-galera
  7. fleetctl start mariadb-galera

It a rather long process, and it take some time. If I could replace the unit, I could just ssh the every db server and do sudo systemctl restart mariadb-galera (without downtime).

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 18, 2016

@klausenbusk ok, could you also explain in more details what are 3), 4), 5) and 6) ? and the content of your units ?

It seems you have the case of template units here? actually currently with this PR you could "submit --replace" a template unit, and already started units from the previous template will not be restarted. To do so you have to add an extra "start --replace template@{x..y}", so the question I see here: between this time could you think of a solution for your use case ?

"fleetctl start --replace" triggers systemd daemon-reload and all the same processes as "systemctl restart" after that.

And a note so we don't forget, we have these transitions:
inactive -> loaded -> launched
submit -> {load|unload} -> {start | stop}
https://coreos.com/fleet/docs/latest/states.html

So unless I'm missing something here, if your units are in the launched state and you want to put them in the loaded state it will always involve a stop (Stop and StopPost directives) then daemon-reload, hence a downtime. Perhaps there is a solution for your use case if you could tell us more ?

Thank you!

@klausenbusk
Copy link

@klausenbusk ok, could you also explain in more details what are 3), 4), 5) and 6) ? and the content of your units ?

Note: I have 3 db nodes.
3: Galera needs a cluster to connect to, but as I have shutdown the whole cluster, I need to start 1 instance of Galera with --wsrep_cluster_address=gcomm:// (create new cluster) which I do just with docker run ..
4: I start mariadb-galera on node 2 with sudo systemctl start mariadb-galera.
5: I wait for it to sync up with Galera, and do the same on node 3.
6: I stop docker run .. on node 1.
7: I do fleetctl start mariadb-galera so mariadb-galera is automated started if a machine is rebooted.

It seems you have the case of template units here?

I use a global unit currently, I see a template unit as a stupid workaround.

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

@klausenbusk why do you use systemctl to start units?

Using templates you can ask fleet to start unit on corresponding machine, i.e.

fleetctl start mariadb-galera-initial@node1
fleetctl start mariadb-galera@node{2..3}
fleetctl stop mariadb-galera-initial@node2
fleetctl start mariadb-galera@node1

These steps could be automated with sidekicks

@antrik
Copy link
Contributor

antrik commented Mar 18, 2016

@klausenbusk it seems what you are essentially asking for is a way to temporarily inhibit part of fleet's reconciliation logic, so it will submit (and load) the new unit, but ignore the fact that the existing unit doesn't match the new requested state...

While I can see the use case for this, I have no idea right now whether this could be done without breaking some fundamental assumptions...

@klausenbusk
Copy link

Using templates you can ask fleet to start unit on corresponding machine, i.e.

I don't see any big benefit of using template. I only need to ssh to 1 node instant of 3, that the only benefit, and I can't add a new db node without doing fleetctl start mariadb-galera@<some-new-number>

The problem is that I need to stop the whole cluster to change the unit file. What I ask for, is a --replace which only replace the unint on disk and do sudo systemctl daemon-reload without restarting the unit.
Then I can ssh to all my db nodes, and restart mariadb-galera one-by-one, without stopping the cluster.

@kayrus
Copy link
Contributor

kayrus commented Mar 18, 2016

@klausenbusk what is the problem of using current implementation fleetctl start --replace mariadb-galera@node1? It will update only one unit on node1 and restart only one unit which is scheduled on node1.

Then you can run fleetctl start --replace mariadb-galera@node2 and fleetctl start --replace mariadb-galera@node3.

This solution doesn't require to stop the cluster.

@antrik
Copy link
Contributor

antrik commented Mar 18, 2016

@dongsupark if you have test cases in mind further testing the functionality introduced in this PR, they clearly belong here, and not into some followup.

@klausenbusk
Copy link

@klausenbusk what is the problem of using current implementation fleetctl start --replace mariadb-galera@node1?

Oh, I didn't through it worked that way. :)

I still prefer a global unit, but template units could be a solution until then.
A flag which restart a global unit node-by-node could be useful.

@antrik
Copy link
Contributor

antrik commented Mar 18, 2016 via email

@@ -78,6 +78,7 @@ var (
Debug bool
Version bool
Help bool
currentCommand string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a flag -- and I don't think it's a good idea to treat it as one. While it might look slightly easier than explicitly passing through the command to the functions which need to discriminate on it, it's also harder to follow the code. Such things are generally frowned upon -- for good reason.

@tixxdz tixxdz force-pushed the tixxdz/fleetctl-replace-unit-v2 branch from cd3a786 to 011a5a3 Compare April 19, 2016 09:08
@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 19, 2016

@jonboulle it's ready, basically I ended up implementing it in an other way, please see comment: #1509 (comment) and also to make sure that ExecStartPre of new unit is serialized after ExecStopPost of the old version of the unit, so we don't have further conflicts between the two versions of the unit, at same time it allows us to try to avoid races which may bring us to #1000 and systemd/systemd#518 .

Another advantage we don't add replace target hashes nor wait from the client perspective fleetctl for the hashes to be updated to the new version, we just wait for the new unit (new hash) to reach its own desired state: submitted, loaded or started as it's currently done. The old unit will be stopped or unloaded before that when we set the inactive state. So instead of waiting for the reconcile, we just do what it does but without waiting and without adding so much code.

Thank you!

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 22, 2016

For the record, patches 9, 10, 11 lgtm, I just added a complex test to cover what I described in this comment: #1509 (comment) the test is documented and this allows us to make sure that systemd directives are serialized but only in the current replace context, external operations are not our fault.

Waiting for some lgtm, then will merge it, we have been rebasing this branch for some time now...

Thanks!

@tixxdz tixxdz force-pushed the tixxdz/fleetctl-replace-unit-v2 branch from 12b7db5 to b4ee8d5 Compare April 22, 2016 10:53
@dongsupark
Copy link
Contributor

TestReplaceSerialization test looks good. Thanks!

@kayrus
Copy link
Contributor

kayrus commented Apr 22, 2016

@tixxdz there is one issue I've just found. When you replace templated unit, updated unit has being scheduled on different machine. Steps to reproduce:

fleetct submit [email protected]
fleetct start hello@{1..10}.service

Then you'll get the units list:

fleetctl list-unit-files                      
UNIT                    HASH    DSTATE          STATE           TARGET
[email protected]          4e11fe3 inactive        inactive        -
[email protected]         4e11fe3 launched        launched        63a6ca6f.../coreos1
[email protected]        79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 loaded          loaded          8c849f40.../coreos3
[email protected]         79699e3 launched        launched        8c849f40.../coreos3
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        8c849f40.../coreos3
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        8c849f40.../coreos3
[email protected]         79699e3 launched        launched        5e719f91.../coreos2

Then make replace:

fleetctl start --replace hello\@4.service
Unit [email protected] inactive
Unit [email protected] launched on 63a6ca6f.../coreos1

And you'll see that unit has been moved from to coreos3 to coreos1:

fleetctl list-unit-files                 
UNIT                    HASH    DSTATE          STATE           TARGET
[email protected]          4e11fe3 inactive        inactive        -
[email protected]         4e11fe3 launched        launched        63a6ca6f.../coreos1
[email protected]        79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 loaded          loaded          8c849f40.../coreos3
[email protected]         4e11fe3 launched        inactive        63a6ca6f.../coreos1
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        8c849f40.../coreos3
[email protected]         79699e3 launched        launched        5e719f91.../coreos2
[email protected]         79699e3 launched        launched        8c849f40.../coreos3
[email protected]         79699e3 launched        launched        5e719f91.../coreos2

Should we document that behavior or should we reschedule unit on the same machine it was started before and move it only when there is a explicit need or requirement? Otherwise when you use extra X-Fleet options, it works as expected, i.e.:

[X-Fleet]
MachineMetadata="hostname=coreos%i"

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 22, 2016

@kayrus fleetctl submit; load or start deal with the notion of cluster, and not a single specified machine. If we go that path then IMO we will reduce fleet functionality and further changes. If we schedule on the same machine:

  1. We are explicitly setting "[X-Fleet] MachineMetadata="hostname=coreos%i"" for users that do not want it.

  2. What happens if a user wants to replace a unit and schedule it based on fleet decisions or any other scheduling decision ? if we do that, then we explicitly chose for him and for sure it will end up re-scheduling all units again and again on the same node. Reducing the notion of migration and automatic replace that we are trying to add here.

  3. We will hardcode that inside fleet logic, which I really do not like. At long term it will be a burden and we may see requests to change it.

  4. It will conflict with [X-Fleet] MachineMetadata="hostname=coreos-another-one" then we have to add code to make one take precedence over the other...

Now if users want to replace on the same machine, there is already the option:
[X-Fleet] MachineMetadata="hostname=coreos%i"

This allows us to solve previous points:

  1. Users who want it, will have to set it.

  2. Users who want fleet to do right thing and do not bother are satisfied.

  3. Nothing is hardcoded inside the code, everything is optional and one can even move units and services from one cluster with X number of machines to another cluster of Y number of machines.

  4. We only have one and unique option we do not add extra logic nor code.

Now for the record we have to update fleetctl help message to let users that it replaces a unit in the cluster and not into a specific node, now if you want it to be in node X, then add MachineMetadata option, which you can do it where previous versions of units are still running.

Will update the "--replace" help message right now. Thanks!

Djalal Harouni and others added 12 commits April 22, 2016 15:22
This patch adds some variables that will be used in the next patch to
implement replace units feature. The replace flag and the current
command that's being executed.
Add checkUnitCreation() to check if the unit should be created or not.
This function handles the new replace logic.

Add isLocalUnitDifferent() since we don't really want to warn if the
Unit do really differ in case "--replace" switch was set. At the same
time factor our unit matching logic. The function handles both cases
when '--replace' is set and not.
Just use isLocalUnitDifferent() instead of old warnOnDifferentLocalUnit()
Move MatchUnitFile() to unit package we will use it inside fleetd to
check for unit matching.
If there is a unit with the same name, check if the content of both
differ if so then we create a new one in the registry.
Since we started to support replacing units and updating their job entries, we
instruct etcd driver to allow updating the job object key with the new provided
unit, and ignore 'job already exists' errors.
Add new helpers util.CopyFile(), util.GenNewFleetService() to prepare
the new functional tests for the replace options.

util.CopyFile() is a helper to copy one file to another.
util.GenNewFleetService() is a helper to replace a string with a new one.
It's necessary for the next functional tests.
TestUnit{Submit,Load,Start}Replace() tests whether a command "fleetctl
{submit,load,start} --replace hello.service" works respectively.
As most of the test sequences are identical, the common part is split
into replaceUnitCommon().
For commands fleetctl {submit,load,start}, also test loading multiple
units at the same time, and replacing each of them one after another.
… of systemd directives

This tests asserts that systemd directives are serialized when we
transit from the old version of the unit to the new one. Make sure that
ExecStartPre of the new one are executed after ExecStopPost of the
previous one.
@tixxdz tixxdz force-pushed the tixxdz/fleetctl-replace-unit-v2 branch from b4ee8d5 to 23750a1 Compare April 22, 2016 13:52
@kayrus
Copy link
Contributor

kayrus commented Apr 22, 2016

@tixxdz I've tried to not unload units, but just stop them. This doesn't work as expected. Fleet for some reason hangs. Other solution is to implement "desiredMachine/rescheduleOnTheSameMachine" option, which could be ignored in case when unit could not be replaced on the same machine (i.e. [X-Fleet] options were added). But this is another issue and could be resolved in new PR if necessary.

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 22, 2016

@kayrus actually yes, we have to unload, free resources, reset status of failed units through systemd manager, and other operations to clean up things.

For the rescheduleOnSameMachine that's would be a nice addition not only for this PR, but probably for other parts or PRs.

Thank you!

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 25, 2016

So this one looks good, all tests pass, many comments. I'm merging it now to unblock the situation, so it won't be lost nor the upcoming PRs.

For users who want to replace units on the same machine they could use what's already being used:
[X-Fleet] MachineMetadata="hostname=coreos%i"

Later one may add rescheduleOnSameMachine.

@jonboulle in all ways, we have to unload then load, hence the rescheduling.

Thank you!

@tixxdz tixxdz merged commit 6132bb0 into coreos:master Apr 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redeploy unit in place
6 participants