Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add intitial healthcheck docs #515

Merged
merged 11 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions docs/docs/how-tos/setup-healthcheck.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
id: setup-healthcheck
title: Set up healthchecks with Kuberhealthy
description: Set up healthchecks with Kuberhealthy
---

:::warning

This feature is in beta status. It should be used with caution.

:::

# Overview
dcmcand marked this conversation as resolved.
Show resolved Hide resolved

Nebari integrates [Kuberhealthy](https://kuberhealthy.github.io/kuberhealthy/) to perform internal healthchecks on Nebari. This is an extensible Kubernetes native framework for continuous synthetic testing. Kuberhealthy is set up to export metrics to Prometheus. This allows them to be seen in Grafana.

## Enabling

Healthchecks are currently considered a beta feature that we are testing. Due to this, they are disabled by default. To enable healthchecks, add the following configuration under the `monitoring` configuration in your `nebari-config.yaml`.

```yaml
monitoring:
healthchecks:
enabled: true
```

## Checking status of Healthchecks

All healthchecks are exported as metrics to Prometheus and can be viewed in Grafana.

For example: To see the uptime for the conda-store service, you can run:

```
1 - (sum(count_over_time(kuberhealthy_check{check="dev/conda-store-http-check", status="0"}[30d])) OR vector(0))/(sum(count_over_time(kuberhealthy_check{check="dev/conda-store-http-check", status="1"}[30d])) * 100)
```

in Grafana, which will show you the following chart.

![Grafana chart showing the uptime for conda store](/img/how-tos/nebari-healthchecks.png)
dcmcand marked this conversation as resolved.
Show resolved Hide resolved

To see what other healthchecks are available, you can use the metric explorer in Grafana. Select the metric type of `kuberhealthy_check` and the label filter of `check`. The values list will be a list of the checks that
have metrics available.

![Display of available kuberhealthy metrics in Grafana](/img/how-tos/nebari-healthchecks1.png)

:::note

If you have previously deployed Nebari without healthchecks, You may need to restart your Prometheus service to get it to pick up the kuberhealthy metrics.

:::

## Summary of available healthchecks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcmcand I added a table of available healthchecks to explain what each one does. Can you review and make sure this is correct?

Also, apologies for the unreadable format - yarn formatting added tons of spaces which makes it hard to read in md format.


Below is an explanation of the available healthchecks. This list may not be comprehensive as work on this feature is ongoing.

| <div style={{width:180}}>Check Label</div> | Description |
| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| conda-store-http-check | verifies that conda-store is accessible via it's REST API |
| jupyterhub-http-check | verifies JupyterHub is running |
| dns-status-internal | verifies internal DNS is accessible |
| daemonset | verifies that a daemonset can be created, fully provisioned, and torn down. This checks the full kubelet functionality of every node in your Kubernetes cluster |
| deployment | verifies that a fresh deployment can run, deploy multiple pods, pass traffic, do a rolling update (without dropping connections), and clean up successfully |
| keycloak-http-check | verifies Keycloak is accessible |
1 change: 1 addition & 0 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ module.exports = {
"how-tos/nebari-extension-system",
"how-tos/telemetry",
"how-tos/setup-monitoring",
"how-tos/setup-healthcheck",
"how-tos/access-logs-loki",
"how-tos/use-gpus",
"how-tos/develop-local-packages",
Expand Down
Binary file added docs/static/img/how-tos/nebari-healthchecks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/static/img/how-tos/nebari-healthchecks1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 27 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"devDependencies": {
"prettier": "3.3.3"
}
}
Loading