Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parallel orchestration needs improvement #59959

Open
gabrielgt opened this issue Apr 2, 2021 · 6 comments · May be fixed by #66956
Open

[BUG] Parallel orchestration needs improvement #59959

gabrielgt opened this issue Apr 2, 2021 · 6 comments · May be fixed by #66956
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@gabrielgt
Copy link

Description
Execution of orchestration steps with parallel: True and dependencies among them using require does not work as expected.

Setup
I am trying to execute states B1 and B2 in one minion and C in another one using parallel: true, that is:

A __ B1 __ B2 __ D
  \_ C _________/

Dependencies among steps are indicated by require. The dependencies among steps could be more complicated and the results are still more unpredictable. But this example is simple enough to observe the problem.

Salt files can be generated with the following script: https://gist.github.com/gabrielgt/e6c5337e2bd69dfefc61cee2c06cc12c

Steps to Reproduce the behavior

  • Download script https://gist.github.com/gabrielgt/e6c5337e2bd69dfefc61cee2c06cc12c
  • Modify the script to use three different minion nodes: 'mn001', 'mn002' and 'mn003'
  • Run the script ./make_parallel_test_structure_2.sh
  • Run the orchestration: salt-run state.orch delme.service_orch -l debug 2>&1 | egrep 'Running state|Completed state'

Expected behavior
The expected behavior is B1 and C starting approximately at the same time. But what happens is that C starts about the same time as B2.

The duration of A, B1, B2 and D is 30 seconds, and the duration of C is 60 seconds, so it should last 120 seconds in total, but instead it lasts 150 seconds.

30 __ 30 __ 30 __________30
         \_ 60 _________/

Result:

$time sudo salt-run state.orch delme.service_orch -l debug 2>&1 | egrep 'Running state|Completed state'
[INFO    ] Running state [state.orchestrate] at time 18:32:14.612501
[INFO    ] Completed state [state.orchestrate] at time 18:32:14.620257 (duration_in_ms=7.748)
[INFO    ] Running state [orch.serviceA] at time 18:32:17.827567
[INFO    ] Completed state [orch.serviceA] at time 18:32:49.561936 (duration_in_ms=31734.369)
[INFO    ] Running state [state.orchestrate] at time 18:32:49.584701
[INFO    ] Completed state [state.orchestrate] at time 18:32:49.600972 (duration_in_ms=16.271)
[INFO    ] Running state [orch.serviceB1] at time 18:32:53.885257
[INFO    ] Completed state [orch.serviceB1] at time 18:33:25.768963 (duration_in_ms=31883.705)
[INFO    ] Running state [state.orchestrate] at time 18:33:25.789756
[INFO    ] Completed state [state.orchestrate] at time 18:33:25.804843 (duration_in_ms=15.088)
[INFO    ] Running state [state.orchestrate] at time 18:33:25.807449
[INFO    ] Completed state [state.orchestrate] at time 18:33:25.831102 (duration_in_ms=23.653)
[INFO    ] Running state [orch.serviceC] at time 18:33:30.419105
[INFO    ] Running state [orch.serviceB2] at time 18:33:31.251604
[INFO    ] Completed state [orch.serviceB2] at time 18:34:03.667342 (duration_in_ms=32415.737)
[INFO    ] Completed state [orch.serviceC] at time 18:34:32.920972 (duration_in_ms=62501.866)
[INFO    ] Running state [state.orchestrate] at time 18:34:32.943566
[INFO    ] Completed state [state.orchestrate] at time 18:34:32.956901 (duration_in_ms=13.336)
[INFO    ] Running state [orch.serviceD] at time 18:34:36.844001
[INFO    ] Completed state [orch.serviceD] at time 18:35:08.826227 (duration_in_ms=31982.226)

real    2m59.788s

Versions Report

Salt Version:
           Salt: 3003rc1

Additional context
This issue is a continuation of issue #55121, partially solved by issue #58976.

@gabrielgt gabrielgt added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 2, 2021
@welcome
Copy link

welcome bot commented Apr 2, 2021

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@garethgreenaway
Copy link
Contributor

@gabrielgt Thanks for the report. Still digging into this one but looks like (at the moment) the order of the orchestration states matter. Running the script provided and having them in the order you had I am able to reproduce this issue. Changing the order to looks like this (excluding serviceD) it appears to work as expected:

orch.runner.serviceA:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceA
orch.runner.serviceB1:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceB1
      - require:
          - salt: orch.runner.serviceA
orch.runner.serviceC:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceC
      - require:
          - salt: orch.runner.serviceA
orch.runner.serviceB2:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceB2
      - require:
          - salt: orch.runner.serviceB1

I suspect what is happens is that serviceA is running, then serviceB1 (which requires serviceA), then the state system is trying to run serviceB2 (which requires serviceB1 that has already run) and runs serviceC at the same time which only requires serviceA, which has already run. There is likely a bug somewhere in the logic that is ordering when the states should run.

@sagetherage sagetherage added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around and removed needs-triage labels May 27, 2021
@sagetherage sagetherage added this to the Approved milestone May 27, 2021
@Rudd-O
Copy link

Rudd-O commented Dec 16, 2021

I have the same issue, parallel execution of orch SLSes with salt-run does not parallelize. One thing worth noting in my case is that all my states run with ssh=True (as well as with parallel=True) and yet, all the states marked parallel execute serially one by one.

Small sample of tasks that should execute in parallel, but don't:

    Configure test on financial.laptop:
        ----------
        salt:
            |_
              ----------
              parallel:
                  True
            |_
              ----------
              pillar:
                  ----------
                  skip_package_installs:
            |_
              ----------
              sls:
                  test
            |_
              ----------
              ssh:
                  True
            |_
              ----------
              tgt:
                  financial.laptop
            |_
              ----------
              tgt_type:
                  list
            - state
            |_
              ----------
              order:
                  10005
        __sls__:
            orch.test
        __env__:
            base
    Configure test on social.laptop:
        ----------
        salt:
            |_
              ----------
              parallel:
                  True
            |_
              ----------
              pillar:
                  ----------
                  skip_package_installs:
            |_
              ----------
              sls:
                  test
            |_
              ----------
              ssh:
                  True
            |_
              ----------
              tgt:
                  social.laptop
            |_
              ----------
              tgt_type:
                  list
            - state
            |_
              ----------
              order:
                  10003
        __sls__:
            orch.test
        __env__:
            base

@garethgreenaway garethgreenaway self-assigned this Mar 1, 2022
@Oloremo
Copy link
Contributor

Oloremo commented Jun 27, 2022

@garethgreenaway Any updates on this?

@uspen
Copy link

uspen commented Jun 28, 2022

@lkubb
Copy link
Contributor

lkubb commented Apr 13, 2024

I did a small investigation: As demonstrated before, this issue is a result of the rather intuitive (than systematic) way predictable ordering works in the state compiler. In absence of other factors, Salt tries to run states in the definition order, one by one. Before a state is executed, all requisites are checked. If a required state has not been run, it will be at this point. Parallel states return immediately. But since the requisites are checked synchronously (before the current state can be executed in parallel), they will be waited upon:

salt/salt/state.py

Lines 2982 to 2985 in aad71fd

while True:
if self.reconcile_procs(run_dict):
break
time.sleep(0.01)

Only after all requisites are met will the current state be run in parallel. This means the definition order of states with parallel: true matters.

Without introducing requisite check parallelization using some kind of IPC, this could possibly be improved by checking if the current chunk with pending dependencies has parallel set and returning a new kind of result (like pending), which would need to be accounted for in call_chunks by adding this chunk to a new list of outstanding ones and trying to execute them during each iteration:

salt/salt/state.py

Lines 2737 to 2754 in aad71fd

for low in chunks:
if "__FAILHARD__" in running:
running.pop("__FAILHARD__")
return running
tag = _gen_tag(low)
if tag not in running:
# Check if this low chunk is paused
action = self.check_pause(low)
if action == "kill":
break
running = self.call_chunk(low, running, chunks)
if self.check_failhard(low, running):
return running
self.active = set()
while True:
if self.reconcile_procs(running):
break
time.sleep(0.01)

Parallel states would not be run in linear order then, but I assume that's the general expectation, as evidenced by this issue. Not sure how/if this could be simplified with #66135, this is not my forte.

Edit: As a quick and horrible PoC, I created the following patch based on the mentioned PR (since it's much easier to work with):

diff --git a/salt/state.py b/salt/state.py
index 9865e60259..2ec3cd91a0 100644
--- a/salt/state.py
+++ b/salt/state.py
@@ -2468,6 +2468,20 @@ class State:
         """
         Iterate over a list of chunks and call them, checking for requires.
         """
+
+        def _call_pending(
+            pending: dict[str, LowChunk], running: dict[str, dict]
+        ) -> tuple[dict[str, LowChunk], dict[str, dict], bool]:
+            still_pending = {}
+            for tag, pend in pending.items():
+                if tag not in running:
+                    running, is_pending = self.call_chunk(pend, running, chunks)
+                    if is_pending:
+                        still_pending[tag] = pend
+                    if self.check_failhard(pend, running):
+                        return still_pending, running, True
+            return still_pending, running, False
+
         if disabled_states is None:
             # Check for any disabled states
             disabled = {}
@@ -2476,7 +2490,11 @@ class State:
         else:
             disabled = disabled_states
         running = {}
+        pending_chunks = {}
         for low in chunks:
+            pending_chunks, running, failhard = _call_pending(pending_chunks, running)
+            if failhard:
+                return running
             if "__FAILHARD__" in running:
                 running.pop("__FAILHARD__")
                 return running
@@ -2486,9 +2504,15 @@ class State:
                 action = self.check_pause(low)
                 if action == "kill":
                     break
-                running = self.call_chunk(low, running, chunks)
+                running, pending = self.call_chunk(low, running, chunks)
+                if pending:
+                    pending_chunks[tag] = low
                 if self.check_failhard(low, running):
                     return running
+        while pending_chunks:
+            pending_chunks, running, failhard = _call_pending(pending_chunks, running)
+            if failhard:
+                return running
         while True:
             if self.reconcile_procs(running):
                 break
@@ -2599,6 +2623,7 @@ class State:
         states.
         """
         reqs = {}
+        pending = False
         for req_type, chunk in self.dependency_dag.get_dependencies(low):
             reqs.setdefault(req_type, []).append(chunk)
         fun_stats = set()
@@ -2617,17 +2642,15 @@ class State:
                 if run_dict_chunk:
                     filtered_run_dict[tag] = run_dict_chunk
             run_dict = filtered_run_dict
-
-            while True:
-                if self.reconcile_procs(run_dict):
-                    break
-                time.sleep(0.01)
+            pending = bool(not self.reconcile_procs(run_dict) and low.get("parallel"))
 
             for chunk in chunks:
                 tag = _gen_tag(chunk)
                 if tag not in run_dict:
                     req_stats.add("unmet")
                     continue
+                if pending:
+                    continue
                 # A state can include a "skip_req" key in the return dict
                 # with a True value to skip triggering onchanges, watch, or
                 # other requisites which would result in a only running on a
@@ -2684,6 +2707,8 @@ class State:
 
         if "unmet" in fun_stats:
             status = "unmet"
+        elif pending:
+            status = "pending"
         elif "fail" in fun_stats:
             status = "fail"
         elif "skip_req" in fun_stats and (fun_stats & {"onchangesmet", "premet"}):
@@ -2765,7 +2790,7 @@ class State:
         running: dict[str, dict],
         chunks: Sequence[LowChunk],
         depth: int = 0,
-    ) -> dict[str, dict]:
+    ) -> tuple[dict[str, dict], bool]:
         """
         Execute the chunk if the requisites did not fail
         """
@@ -2781,8 +2806,13 @@ class State:
 
         status, reqs = self._check_requisites(low, running)
         if status == "unmet":
-            if self._call_unmet_requisites(low, running, chunks, tag, depth):
-                return running
+            running_failhard, pending = self._call_unmet_requisites(
+                low, running, chunks, tag, depth
+            )
+            if running_failhard or pending:
+                return running, pending
+        elif status == "pending":
+            return running, True
         elif status == "met":
             if low.get("__prereq__"):
                 self.pre[tag] = self.call(low, chunks, running)
@@ -2910,7 +2940,7 @@ class State:
                 for key in ("__sls__", "__id__", "name"):
                     running[sub_tag][key] = low.get(key)
 
-        return running
+        return running, False
 
     def _assign_not_run_result_dict(
         self,
@@ -2944,16 +2974,19 @@ class State:
         chunks: Sequence[LowChunk],
         tag: str,
         depth: int,
-    ) -> dict[str, dict]:
+    ) -> tuple[dict[str, dict], bool]:
+        pending = False
         for _, chunk in self.dependency_dag.get_dependencies(low):
             # Check to see if the chunk has been run, only run it if
             # it has not been run already
             ctag = _gen_tag(chunk)
             if ctag not in running:
-                running = self.call_chunk(chunk, running, chunks)
+                running, pending = self.call_chunk(chunk, running, chunks)
+                if pending:
+                    return running, pending
                 if self.check_failhard(chunk, running):
                     running["__FAILHARD__"] = True
-                    return running
+                    return running, pending
         if low.get("__prereq__"):
             status, _ = self._check_requisites(low, running)
             self.pre[tag] = self.call(low, chunks, running)
@@ -2975,11 +3008,11 @@ class State:
                 for key in ("__sls__", "__id__", "name"):
                     running[tag][key] = low.get(key)
             else:
-                running = self.call_chunk(low, running, chunks, depth)
+                running, pending = self.call_chunk(low, running, chunks, depth)
         if self.check_failhard(low, running):
             running["__FAILHARD__"] = True
-            return running
-        return {}
+            return running, pending
+        return {}, pending
 
     def call_beacons(self, chunks: Iterable[LowChunk], running: dict) -> dict:
         """

Running the following sls:

orch.runner.serviceA:
  cmd.run:
      - name: sleep 10
      - parallel: True

orch.runner.serviceB1:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceA

orch.runner.serviceB2:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceB1

orch.runner.serviceC:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceA

then yields (as expected, apart from the total runtime, but that's a different bug):

local:
----------
          ID: orch.runner.serviceA
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:19.602806
    Duration: 10016.807 ms
     Changes:
              ----------
              pid:
                  52913
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceB1
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:30.847311
    Duration: 10013.301 ms
     Changes:
              ----------
              pid:
                  53377
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceC
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:30.845488
    Duration: 10019.78 ms
     Changes:
              ----------
              pid:
                  53376
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceB2
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:42.051838
    Duration: 10022.191 ms
     Changes:
              ----------
              pid:
                  53567
              retcode:
                  0
              stderr:
              stdout:

Summary for local
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:  10.022 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants