Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runbook for handling uptime alerts #116

Merged
merged 3 commits into from
Sep 27, 2024
Merged

Runbook for handling uptime alerts #116

merged 3 commits into from
Sep 27, 2024

Conversation

wejdross
Copy link
Member

Summary

Checklist

  • Try to isolate changes into separate PRs (to build a better changelog).
  • Categorize the PR by setting a good title and adding one of the labels:
    change, decision, requirement/quality, requirement/functional, dependency
    as they show up in the changelog
  • Link this PR to related issues if applicable.

@wejdross wejdross requested a review from Kidswiss September 25, 2024 09:12
kubectl --as cluster-admin get xvshn[TAB here for specific service] | egrep $instanceNamespace_generated_chars # also describe to read what happened
----

.Check logs of our comp-functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is helpful to anyone who's never worked with our AppCat.
Also the comp-functions don't necessarily say anything about why a specific service is not working.

kubectl -n syn-crossplane logs deployments/function-appcat-aeb2dbb03cf6 # <--- this number changes regularly
----

For stuck resources, You can create dummy label on object and then rollout restart crossplane function-appcat and provider-kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone who has never used Crossplane might not know what a "stuck" resource is.

@wejdross wejdross requested a review from Kidswiss September 25, 2024 12:06
Copy link
Contributor

@Kidswiss Kidswiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more nit, rest LGTM

----
#$instanceNamespace_generated_chars can be obtained in a way: `echo vshn-postgresql-my-super-prod-5jfjn | rev | cut -d'-' -f1 | rev` ===> 5jfjn
kubectl --as cluster-admin get objects | egrep $instanceNamespace_generated_chars # here look for False objects and describe them to find out what is wrong
kubectl --as cluster-admin get xvshn[TAB here for specific service] | egrep $instanceNamespace_generated_chars # also describe to read what happened
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably go with describe from the beginning and not just mention it in a side note. It shows more information and the events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to that and also added a real life example, so it's going to be easier for engineers to go through our abstractions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still shows as kubectl --as cluster-admin get xvshn for me not kubectl --as cluster-admin describe xvshn

@wejdross wejdross merged commit 3fe12dc into master Sep 27, 2024
1 check passed
@wejdross wejdross deleted the runbook_sla_alert branch September 27, 2024 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants