Further improvements to pending kernels managment #732

Zsailer · 2022-01-05T00:34:04Z

Two changes in this PR, summarized below. This work is a follow-up to #712

Make `shutdown_kernel` a pending state

Makes shutdown_kernel also show the kernel in a pending state. Since the KernelManager is still managing a process while it's shutting down, which might take a long time, I think this should show as a pending state too.

Make `KernelManager` only responsible for reporting kernel pending state

Also, after working with pending kernels a bit, I believe it makes the most sense to make the KernelManager responsible for reporting the kernel's pending state, while the layer that sits above the KernelManager, e.g. MultiKernelManager, responsible for reacting to that state.

This essentially means removing all self.ready checks in the individual KernelManager. For example,

jupyter_client/jupyter_client/manager.py

Lines 455 to 457 in 3082366

    
           # Shutdown is a no-op for a kernel that had a failed startup 
        
           if self._ready.exception(): 
        
               return

should be removed; rather, a MultiKernelManager whose use_pending_kernels attribute is True would determine if shutdown is a passthrough, etc.

Another example is

jupyter_client/jupyter_client/manager.py

Lines 506 to 507 in 3082366

    
           if not self._ready.done(): 
        
               raise RuntimeError("Cannot restart the kernel. " "Kernel has not fully started.")

The MultiKernelManager would be responsible for handling what to do when a kernel restart happens while a kernel is in a pending state.

blink1073 · 2022-01-05T19:39:10Z

This seems reasonable to me. Zach and I discussed this today and agreed that we should release a version of jupyter_server that supports these changes, get downstream tests here passing, and then release a minor release with these refinements.

kevin-bates

Thanks @Zsailer - this looks good. I just had a question regarding the now method in the tests.

It might be good to also update the help-string of the ready property. Perhaps something like:

"""A future that resolves when the kernel has completed its startup or shutdown"""

kevin-bates · 2022-01-06T20:42:02Z

jupyter_client/tests/test_multikernelmanager.py

+    """Use this function ensure that this awaitable
+    happens before other awaitables defined after it.
+    """
+    (out,) = await asyncio.gather(awaitable)


I guess I don't understand why this method is necessary - especially for a single-item gather. Isn't awaiting the awaitable sufficient to prevent the execution of follow-on awaitables in the same execution path?

Great question, @kevin-bates.

That's what's supposed to happen; however, I'm seeing some weird behavior that I think is coming from the gen_test decorator. Basically, I'm not seeing the awaits happening in the order they are defined.

Weirdly, if I replace the async/await syntax with the old yield syntax, these tests work fine (no now needed). Again, this suggest some weird interference happening from Tornado's async testing. I'm not sure how to fix this.

By enforcing a gather call in each await statement, everything happens in the order they are called. Super strange!

Thanks Zach. That is interesting.

Looks like nested coroutines can be a bit difficult in unit tests. It appears that inner coroutines never get scheduled even when the outer coroutine is awaited. Adding ensure_future in multiple places in the tests ensures that these coroutines get scheduled and tests pass again.

Zsailer · 2022-01-10T23:10:38Z

It looks like all tests are (finally) passing except the downstream tests. For that, jupyter-server/jupyter_server#654 is required.

I'll work on getting that merged and then ask for a final review here.

Zsailer · 2022-01-11T01:06:29Z

I've also added some documentation for pending kernels to this PR

jupyter_client/multikernelmanager.py

Zsailer · 2022-01-11T17:35:06Z

This PR evolved a bit. I've added documentation to this PR to define more clearly what a "pending" kernel is.

A pending kernel is a kernel that is currently in the process of "starting" or "shutting down".

The multikernelmanager will block any subsequent actions that attempt to affect a pending kernel—e.g. "interrupt", "restart", "shutdown"—by raising a RuntimeError.

Remember, pending kernels are opt-in, so this feature is not enabled by default.

Zsailer · 2022-01-13T19:38:10Z

I'm not sure why the nbclient downstream tests are failing. When I run these locally, they run successfully.

Zsailer · 2022-01-13T19:55:40Z

I kicked the downstream tests in #731 (sorry @blink1073 😅 ) and see the exact same errors, so they are unrelated to this PR.

This should be ready to merge. 🚀

davidbrochart · 2022-01-13T20:15:28Z

The nbclient failures should be fixed by jupyter/nbclient#190.
I could release nbclient if you want to be sure.

…ore responsibility

…ager

…sn't start properly

davidbrochart · 2022-01-13T20:39:53Z

nbclient v0.5.10 is on PyPI with the fix.

Zsailer · 2022-01-13T20:40:57Z

Thanks, @davidbrochart! I really appreciate it!

Zsailer · 2022-01-14T17:53:10Z

Waiting on jupyter-server/jupyter_server#662 to be merged and released to fix the downstream tests.

Zsailer · 2022-01-14T19:23:03Z

All green!

Zsailer · 2022-01-14T19:25:19Z

This was discussed extensively at the last Jupyter Server meeting: jupyter-server/team-compass#15 (comment)

Merging away!

kevin-bates

Just had a couple of suggestions and comments, but this looks good.

kevin-bates · 2022-01-14T20:10:21Z

docs/pending-kernels.rst

+
+*Added in 7.1.0*
+
+In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.


Suggested change

In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.

In scenarios where a kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.

kevin-bates · 2022-01-14T20:10:27Z

jupyter_client/multikernelmanager.py

+    @property
+    def _starting_kernels(self):
+        """A shim for backwards compatibility."""
+        return self._pending_kernels


Should this be marked as deprecated?

kevin-bates · 2022-01-14T20:10:32Z

jupyter_client/multikernelmanager.py

+        if self._using_pending_kernels() and kernel_id in self._pending_kernels:
+            raise RuntimeError("Kernel is in a pending state. Cannot shutdown.")
+        # If the kernel is still starting, wait for it to be ready.
+        elif kernel_id in self._starting_kernels:


Is the use of _starting_kernels intentional? Technically, this could also contain kernels that are shutting down now, but only when pending kernels are enabled - so I suspect this was for hinting that, in this case, the kernel can only be starting.

Perhaps we could be more explicit (and save an existence check [nit]) via:

if kernel_id in self._pending_kernels: if self._using_pending_kernels(): raise RuntimeError("Kernel is in a pending state. Cannot shutdown.") else: # kernel is still starting, wait for its startup kernel = self._pending_kernels[kernel_id] try: await kernel except Exception: self.remove_kernel(kernel_id)

kevin-bates · 2022-01-14T20:10:34Z

jupyter_client/multikernelmanager.py

            try:
-                await self._starting_kernels[kernel_id]
+                await kernel
            except Exception:
                self.remove_kernel(kernel_id)


Do we want to update the _pending_kernels list as well here?

kevin-bates · 2022-01-14T20:13:24Z

Hmm - sorry, it looks like this was merged while I was reviewing. The comments aren't earth-shattering so I'll leave it to you to determine if they have any merit.

Thanks for all the work on this @Zsailer - good stuff!

jtpio · 2022-01-17T08:15:02Z

Looks like the 7.1.1 release of jupyter_client affected the JupyterLab tests: jupyterlab/jupyterlab#11886

The check is defined here:

https://github.com/jupyterlab/jupyterlab/blob/50fa0047d63287005a2c3b1d8f5bfedfe56cde7e/packages/apputils/test/sessioncontext.spec.ts#L453-L469

Pinning to 7.1.0 fixes it in jupyterlab/jupyterlab#11887.

Reporting here for reference.

Zsailer requested a review from blink1073 January 5, 2022 00:34

Zsailer added the enhancement label Jan 5, 2022

kevin-bates reviewed Jan 6, 2022

View reviewed changes

Zsailer changed the title ~~Stopping kernel is a pending kernel~~ Fuether improvements to pending kernels managment Jan 8, 2022

Zsailer changed the title ~~Fuether improvements to pending kernels managment~~ Further improvements to pending kernels managment Jan 8, 2022

kevin-bates mentioned this pull request Jan 10, 2022

An exception occurred when the kernel restarted, causing it to keep restarting #734

Open

Zsailer mentioned this pull request Jan 10, 2022

Add more awaits for pending kernel in unit tests jupyter-server/jupyter_server#654

Merged

blink1073 reviewed Jan 11, 2022

View reviewed changes

jupyter_client/multikernelmanager.py Outdated Show resolved Hide resolved

This was referenced Jan 13, 2022

Name initial core team and add documentation about roles jupyter-server/team-compass#14

Merged

Meeting Notes 2022 jupyter-server/team-compass#15

Closed

Zsailer and others added 13 commits January 13, 2022 12:22

simplify pending state logic and require multikernelmanager to take m…

a877478

…ore responsibility

remove unnecessary whitespace

2b8790a

precommit hooks

8076482

handle shutdown and restart for pending kernels in the multikernelman…

73ae431

…ager

work updates to tests

5f97e4e

ignore mypy error

dfc1a85

fix regression in KM. be sure to raise an exception when a kernel doe…

54c86a0

…sn't start properly

ensure shutdown_kernel is backwards compatible

94595af

in shutdown remove kernels that raise exceptions

de8d2c5

shutdown all on pending kernels requires waiting for non-pending state

9bc967a

add docs for pending kernels

2d7626c

remove commented out line

a20be02

prevent tests from failing fast

da3e79d

Zsailer force-pushed the pending-state branch from 2285fcd to da3e79d Compare January 13, 2022 20:22

continue on errors in github workflow

5841560

Remove continue-on-error in github workflow.

a382498

Zsailer mentioned this pull request Jan 14, 2022

More updates to unit tests for pending kernels work jupyter-server/jupyter_server#662

Merged

Zsailer merged commit 4428715 into jupyter:main Jan 14, 2022

kevin-bates reviewed Jan 14, 2022

View reviewed changes

jtpio mentioned this pull request Jan 17, 2022

js-apputils check is failing on CI jupyterlab/jupyterlab#11886

Closed

Zsailer mentioned this pull request Jan 18, 2022

Fix nbconvert handler run_sync() jupyter-server/jupyter_server#667

Merged

blink1073 mentioned this pull request Feb 3, 2022

CI is Flakey jupyter-server/jupyter_server#677

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further improvements to pending kernels managment #732

Further improvements to pending kernels managment #732

Zsailer commented Jan 5, 2022 •

edited

Loading

blink1073 commented Jan 5, 2022

kevin-bates left a comment

kevin-bates Jan 6, 2022

Zsailer Jan 6, 2022

Zsailer Jan 6, 2022

kevin-bates Jan 6, 2022

Zsailer Jan 8, 2022

Zsailer commented Jan 10, 2022

Zsailer commented Jan 11, 2022

Zsailer commented Jan 11, 2022

Zsailer commented Jan 13, 2022

Zsailer commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

Zsailer commented Jan 13, 2022

Zsailer commented Jan 14, 2022

Zsailer commented Jan 14, 2022

Zsailer commented Jan 14, 2022

kevin-bates left a comment

kevin-bates Jan 14, 2022

kevin-bates Jan 14, 2022

kevin-bates Jan 14, 2022

kevin-bates Jan 14, 2022

kevin-bates commented Jan 14, 2022

jtpio commented Jan 17, 2022 •

edited

Loading

	# Shutdown is a no-op for a kernel that had a failed startup
	if self._ready.exception():
	return

	if not self._ready.done():
	raise RuntimeError("Cannot restart the kernel. " "Kernel has not fully started.")


		Added in 7.1.0

		In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.

	In scenarios where an kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.
	In scenarios where a kernel takes a long time to start (e.g. kernels running remotely), it can be advantageous to immediately return the kernel's model and ID from key methods like ``.start_kernel()`` and ``.shutdown_kernel()``. The kernel will continue its task without blocking other managerial actions.

Further improvements to pending kernels managment #732

Further improvements to pending kernels managment #732

Conversation

Zsailer commented Jan 5, 2022 • edited Loading

Make shutdown_kernel a pending state

Make KernelManager only responsible for reporting kernel pending state

blink1073 commented Jan 5, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zsailer commented Jan 10, 2022

Zsailer commented Jan 11, 2022

Zsailer commented Jan 11, 2022

Zsailer commented Jan 13, 2022

Zsailer commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

davidbrochart commented Jan 13, 2022

Zsailer commented Jan 13, 2022

Zsailer commented Jan 14, 2022

Zsailer commented Jan 14, 2022

Zsailer commented Jan 14, 2022

kevin-bates left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevin-bates commented Jan 14, 2022

jtpio commented Jan 17, 2022 • edited Loading

Zsailer commented Jan 5, 2022 •

edited

Loading

Make `shutdown_kernel` a pending state

Make `KernelManager` only responsible for reporting kernel pending state

jtpio commented Jan 17, 2022 •

edited

Loading