[BUG] Parallel orchestration needs improvement #59959

gabrielgt · 2021-04-02T21:46:16Z

Description
Execution of orchestration steps with parallel: True and dependencies among them using require does not work as expected.

Setup
I am trying to execute states B1 and B2 in one minion and C in another one using parallel: true, that is:

A __ B1 __ B2 __ D
  \_ C _________/

Dependencies among steps are indicated by require. The dependencies among steps could be more complicated and the results are still more unpredictable. But this example is simple enough to observe the problem.

Salt files can be generated with the following script: https://gist.github.com/gabrielgt/e6c5337e2bd69dfefc61cee2c06cc12c

Steps to Reproduce the behavior

Download script https://gist.github.com/gabrielgt/e6c5337e2bd69dfefc61cee2c06cc12c
Modify the script to use three different minion nodes: 'mn001', 'mn002' and 'mn003'
Run the script ./make_parallel_test_structure_2.sh
Run the orchestration: salt-run state.orch delme.service_orch -l debug 2>&1 | egrep 'Running state|Completed state'

Expected behavior
The expected behavior is B1 and C starting approximately at the same time. But what happens is that C starts about the same time as B2.

The duration of A, B1, B2 and D is 30 seconds, and the duration of C is 60 seconds, so it should last 120 seconds in total, but instead it lasts 150 seconds.

30 __ 30 __ 30 __________30
         \_ 60 _________/

Result:

$time sudo salt-run state.orch delme.service_orch -l debug 2>&1 | egrep 'Running state|Completed state'
[INFO    ] Running state [state.orchestrate] at time 18:32:14.612501
[INFO    ] Completed state [state.orchestrate] at time 18:32:14.620257 (duration_in_ms=7.748)
[INFO    ] Running state [orch.serviceA] at time 18:32:17.827567
[INFO    ] Completed state [orch.serviceA] at time 18:32:49.561936 (duration_in_ms=31734.369)
[INFO    ] Running state [state.orchestrate] at time 18:32:49.584701
[INFO    ] Completed state [state.orchestrate] at time 18:32:49.600972 (duration_in_ms=16.271)
[INFO    ] Running state [orch.serviceB1] at time 18:32:53.885257
[INFO    ] Completed state [orch.serviceB1] at time 18:33:25.768963 (duration_in_ms=31883.705)
[INFO    ] Running state [state.orchestrate] at time 18:33:25.789756
[INFO    ] Completed state [state.orchestrate] at time 18:33:25.804843 (duration_in_ms=15.088)
[INFO    ] Running state [state.orchestrate] at time 18:33:25.807449
[INFO    ] Completed state [state.orchestrate] at time 18:33:25.831102 (duration_in_ms=23.653)
[INFO    ] Running state [orch.serviceC] at time 18:33:30.419105
[INFO    ] Running state [orch.serviceB2] at time 18:33:31.251604
[INFO    ] Completed state [orch.serviceB2] at time 18:34:03.667342 (duration_in_ms=32415.737)
[INFO    ] Completed state [orch.serviceC] at time 18:34:32.920972 (duration_in_ms=62501.866)
[INFO    ] Running state [state.orchestrate] at time 18:34:32.943566
[INFO    ] Completed state [state.orchestrate] at time 18:34:32.956901 (duration_in_ms=13.336)
[INFO    ] Running state [orch.serviceD] at time 18:34:36.844001
[INFO    ] Completed state [orch.serviceD] at time 18:35:08.826227 (duration_in_ms=31982.226)

real    2m59.788s

Versions Report

Salt Version:
           Salt: 3003rc1

Additional context
This issue is a continuation of issue #55121, partially solved by issue #58976.

The text was updated successfully, but these errors were encountered:

welcome · 2021-04-02T21:46:17Z

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

garethgreenaway · 2021-05-08T00:48:36Z

@gabrielgt Thanks for the report. Still digging into this one but looks like (at the moment) the order of the orchestration states matter. Running the script provided and having them in the order you had I am able to reproduce this issue. Changing the order to looks like this (excluding serviceD) it appears to work as expected:

orch.runner.serviceA:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceA
orch.runner.serviceB1:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceB1
      - require:
          - salt: orch.runner.serviceA
orch.runner.serviceC:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceC
      - require:
          - salt: orch.runner.serviceA
orch.runner.serviceB2:
  salt.runner:
      - name: state.orchestrate
      - parallel: True
      - kwarg:
          mods: {{ sls_path }}/serviceB2
      - require:
          - salt: orch.runner.serviceB1

I suspect what is happens is that serviceA is running, then serviceB1 (which requires serviceA), then the state system is trying to run serviceB2 (which requires serviceB1 that has already run) and runs serviceC at the same time which only requires serviceA, which has already run. There is likely a bug somewhere in the logic that is ordering when the states should run.

Rudd-O · 2021-12-16T14:01:00Z

I have the same issue, parallel execution of orch SLSes with salt-run does not parallelize. One thing worth noting in my case is that all my states run with ssh=True (as well as with parallel=True) and yet, all the states marked parallel execute serially one by one.

Small sample of tasks that should execute in parallel, but don't:

    Configure test on financial.laptop:
        ----------
        salt:
            |_
              ----------
              parallel:
                  True
            |_
              ----------
              pillar:
                  ----------
                  skip_package_installs:
            |_
              ----------
              sls:
                  test
            |_
              ----------
              ssh:
                  True
            |_
              ----------
              tgt:
                  financial.laptop
            |_
              ----------
              tgt_type:
                  list
            - state
            |_
              ----------
              order:
                  10005
        __sls__:
            orch.test
        __env__:
            base
    Configure test on social.laptop:
        ----------
        salt:
            |_
              ----------
              parallel:
                  True
            |_
              ----------
              pillar:
                  ----------
                  skip_package_installs:
            |_
              ----------
              sls:
                  test
            |_
              ----------
              ssh:
                  True
            |_
              ----------
              tgt:
                  social.laptop
            |_
              ----------
              tgt_type:
                  list
            - state
            |_
              ----------
              order:
                  10003
        __sls__:
            orch.test
        __env__:
            base

Oloremo · 2022-06-27T11:04:52Z

@garethgreenaway Any updates on this?

uspen · 2022-06-28T10:32:47Z

lkubb · 2024-04-13T16:04:00Z

I did a small investigation: As demonstrated before, this issue is a result of the rather intuitive (than systematic) way predictable ordering works in the state compiler. In absence of other factors, Salt tries to run states in the definition order, one by one. Before a state is executed, all requisites are checked. If a required state has not been run, it will be at this point. Parallel states return immediately. But since the requisites are checked synchronously (before the current state can be executed in parallel), they will be waited upon:

salt/salt/state.py

Lines 2982 to 2985 in aad71fd

    
           while True: 
        
               if self.reconcile_procs(run_dict): 
        
                   break 
        
               time.sleep(0.01)

Only after all requisites are met will the current state be run in parallel. This means the definition order of states with parallel: true matters.

Without introducing requisite check parallelization using some kind of IPC, this could possibly be improved by checking if the current chunk with pending dependencies has parallel set and returning a new kind of result (like pending), which would need to be accounted for in call_chunks by adding this chunk to a new list of outstanding ones and trying to execute them during each iteration:

salt/salt/state.py

Lines 2737 to 2754 in aad71fd

    
           for low in chunks: 
        
               if "__FAILHARD__" in running: 
        
                   running.pop("__FAILHARD__") 
        
                   return running 
        
               tag = _gen_tag(low) 
        
               if tag not in running: 
        
                   # Check if this low chunk is paused 
        
                   action = self.check_pause(low) 
        
                   if action == "kill": 
        
                       break 
        
                   running = self.call_chunk(low, running, chunks) 
        
                   if self.check_failhard(low, running): 
        
                       return running 
        
               self.active = set() 
        
           while True: 
        
               if self.reconcile_procs(running): 
        
                   break 
        
               time.sleep(0.01)

Parallel states would not be run in linear order then, but I assume that's the general expectation, as evidenced by this issue. Not sure how/if this could be simplified with #66135, this is not my forte.

Edit: As a quick and horrible PoC, I created the following patch based on the mentioned PR (since it's much easier to work with):

diff --git a/salt/state.py b/salt/state.py
index 9865e60259..2ec3cd91a0 100644
--- a/salt/state.py
+++ b/salt/state.py
@@ -2468,6 +2468,20 @@ class State:
         """
         Iterate over a list of chunks and call them, checking for requires.
         """
+
+        def _call_pending(
+            pending: dict[str, LowChunk], running: dict[str, dict]
+        ) -> tuple[dict[str, LowChunk], dict[str, dict], bool]:
+            still_pending = {}
+            for tag, pend in pending.items():
+                if tag not in running:
+                    running, is_pending = self.call_chunk(pend, running, chunks)
+                    if is_pending:
+                        still_pending[tag] = pend
+                    if self.check_failhard(pend, running):
+                        return still_pending, running, True
+            return still_pending, running, False
+
         if disabled_states is None:
             # Check for any disabled states
             disabled = {}
@@ -2476,7 +2490,11 @@ class State:
         else:
             disabled = disabled_states
         running = {}
+        pending_chunks = {}
         for low in chunks:
+            pending_chunks, running, failhard = _call_pending(pending_chunks, running)
+            if failhard:
+                return running
             if "__FAILHARD__" in running:
                 running.pop("__FAILHARD__")
                 return running
@@ -2486,9 +2504,15 @@ class State:
                 action = self.check_pause(low)
                 if action == "kill":
                     break
-                running = self.call_chunk(low, running, chunks)
+                running, pending = self.call_chunk(low, running, chunks)
+                if pending:
+                    pending_chunks[tag] = low
                 if self.check_failhard(low, running):
                     return running
+        while pending_chunks:
+            pending_chunks, running, failhard = _call_pending(pending_chunks, running)
+            if failhard:
+                return running
         while True:
             if self.reconcile_procs(running):
                 break
@@ -2599,6 +2623,7 @@ class State:
         states.
         """
         reqs = {}
+        pending = False
         for req_type, chunk in self.dependency_dag.get_dependencies(low):
             reqs.setdefault(req_type, []).append(chunk)
         fun_stats = set()
@@ -2617,17 +2642,15 @@ class State:
                 if run_dict_chunk:
                     filtered_run_dict[tag] = run_dict_chunk
             run_dict = filtered_run_dict
-
-            while True:
-                if self.reconcile_procs(run_dict):
-                    break
-                time.sleep(0.01)
+            pending = bool(not self.reconcile_procs(run_dict) and low.get("parallel"))
 
             for chunk in chunks:
                 tag = _gen_tag(chunk)
                 if tag not in run_dict:
                     req_stats.add("unmet")
                     continue
+                if pending:
+                    continue
                 # A state can include a "skip_req" key in the return dict
                 # with a True value to skip triggering onchanges, watch, or
                 # other requisites which would result in a only running on a
@@ -2684,6 +2707,8 @@ class State:
 
         if "unmet" in fun_stats:
             status = "unmet"
+        elif pending:
+            status = "pending"
         elif "fail" in fun_stats:
             status = "fail"
         elif "skip_req" in fun_stats and (fun_stats & {"onchangesmet", "premet"}):
@@ -2765,7 +2790,7 @@ class State:
         running: dict[str, dict],
         chunks: Sequence[LowChunk],
         depth: int = 0,
-    ) -> dict[str, dict]:
+    ) -> tuple[dict[str, dict], bool]:
         """
         Execute the chunk if the requisites did not fail
         """
@@ -2781,8 +2806,13 @@ class State:
 
         status, reqs = self._check_requisites(low, running)
         if status == "unmet":
-            if self._call_unmet_requisites(low, running, chunks, tag, depth):
-                return running
+            running_failhard, pending = self._call_unmet_requisites(
+                low, running, chunks, tag, depth
+            )
+            if running_failhard or pending:
+                return running, pending
+        elif status == "pending":
+            return running, True
         elif status == "met":
             if low.get("__prereq__"):
                 self.pre[tag] = self.call(low, chunks, running)
@@ -2910,7 +2940,7 @@ class State:
                 for key in ("__sls__", "__id__", "name"):
                     running[sub_tag][key] = low.get(key)
 
-        return running
+        return running, False
 
     def _assign_not_run_result_dict(
         self,
@@ -2944,16 +2974,19 @@ class State:
         chunks: Sequence[LowChunk],
         tag: str,
         depth: int,
-    ) -> dict[str, dict]:
+    ) -> tuple[dict[str, dict], bool]:
+        pending = False
         for _, chunk in self.dependency_dag.get_dependencies(low):
             # Check to see if the chunk has been run, only run it if
             # it has not been run already
             ctag = _gen_tag(chunk)
             if ctag not in running:
-                running = self.call_chunk(chunk, running, chunks)
+                running, pending = self.call_chunk(chunk, running, chunks)
+                if pending:
+                    return running, pending
                 if self.check_failhard(chunk, running):
                     running["__FAILHARD__"] = True
-                    return running
+                    return running, pending
         if low.get("__prereq__"):
             status, _ = self._check_requisites(low, running)
             self.pre[tag] = self.call(low, chunks, running)
@@ -2975,11 +3008,11 @@ class State:
                 for key in ("__sls__", "__id__", "name"):
                     running[tag][key] = low.get(key)
             else:
-                running = self.call_chunk(low, running, chunks, depth)
+                running, pending = self.call_chunk(low, running, chunks, depth)
         if self.check_failhard(low, running):
             running["__FAILHARD__"] = True
-            return running
-        return {}
+            return running, pending
+        return {}, pending
 
     def call_beacons(self, chunks: Iterable[LowChunk], running: dict) -> dict:
         """

Running the following sls:

orch.runner.serviceA:
  cmd.run:
      - name: sleep 10
      - parallel: True

orch.runner.serviceB1:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceA

orch.runner.serviceB2:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceB1

orch.runner.serviceC:
  cmd.run:
      - name: sleep 10
      - parallel: True
      - require:
          - orch.runner.serviceA

then yields (as expected, apart from the total runtime, but that's a different bug):

local:
----------
          ID: orch.runner.serviceA
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:19.602806
    Duration: 10016.807 ms
     Changes:
              ----------
              pid:
                  52913
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceB1
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:30.847311
    Duration: 10013.301 ms
     Changes:
              ----------
              pid:
                  53377
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceC
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:30.845488
    Duration: 10019.78 ms
     Changes:
              ----------
              pid:
                  53376
              retcode:
                  0
              stderr:
              stdout:
----------
          ID: orch.runner.serviceB2
    Function: cmd.run
        Name: sleep 10
      Result: True
     Comment: Command "sleep 10" run
     Started: 10:15:42.051838
    Duration: 10022.191 ms
     Changes:
              ----------
              pid:
                  53567
              retcode:
                  0
              stderr:
              stdout:

Summary for local
------------
Succeeded: 4 (changed=4)
Failed:    0
------------
Total states run:     4
Total run time:  10.022 s

gabrielgt added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 2, 2021

sagetherage assigned garethgreenaway Apr 5, 2021

sagetherage added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around and removed needs-triage labels May 27, 2021

sagetherage unassigned garethgreenaway May 27, 2021

sagetherage added this to the Approved milestone May 27, 2021

garethgreenaway self-assigned this Mar 1, 2022

lkubb mentioned this issue Apr 22, 2024

[master] Fix requisite performance issues and aggregation #66135

Merged

3 tasks

dwoz unassigned garethgreenaway Jul 29, 2024

lkubb added a commit to lkubb/salt that referenced this issue Oct 10, 2024

Add test for issue saltstack#59959

537f1bb

lkubb linked a pull request Oct 10, 2024 that will close this issue

[master] Don't block when evaluating requisites of parallel states #66956

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Parallel orchestration needs improvement #59959

[BUG] Parallel orchestration needs improvement #59959

gabrielgt commented Apr 2, 2021

welcome bot commented Apr 2, 2021

garethgreenaway commented May 8, 2021

Rudd-O commented Dec 16, 2021

Oloremo commented Jun 27, 2022

uspen commented Jun 28, 2022

lkubb commented Apr 13, 2024 •

edited

Loading

[BUG] Parallel orchestration needs improvement #59959

[BUG] Parallel orchestration needs improvement #59959

Comments

gabrielgt commented Apr 2, 2021

welcome bot commented Apr 2, 2021

garethgreenaway commented May 8, 2021

Rudd-O commented Dec 16, 2021

Oloremo commented Jun 27, 2022

uspen commented Jun 28, 2022

lkubb commented Apr 13, 2024 • edited Loading

lkubb commented Apr 13, 2024 •

edited

Loading