Added permission error handling for general prompt

robusta-dev · Dec 23, 2024 · d456d7b · d456d7b
1 parent 2cde05c
commit d456d7b
Show file tree

Hide file tree

Showing 4 changed files with 81 additions and 2 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -43,6 +43,14 @@ RUN ./kube-lineage --version
 
 RUN curl -sSL -o argocd-linux-amd64 https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
 
+# Install Helm
+RUN curl https://baltocdn.com/helm/signing.asc | gpg --dearmor -o /usr/share/keyrings/helm.gpg \
+    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" \
+    | tee /etc/apt/sources.list.d/helm-stable-debian.list \
+    && apt-get update \
+    && apt-get install -y helm \
+    && rm -rf /var/lib/apt/lists/*
+
 # Set up poetry
 ARG PRIVATE_PACKAGE_REGISTRY="none"
 RUN if [ "${PRIVATE_PACKAGE_REGISTRY}" != "none" ]; then \
@@ -92,10 +100,16 @@ RUN apt-get install -y kubectl
 COPY --from=builder /app/kube-lineage /usr/local/bin
 RUN kube-lineage --version
 
+# Set up ArgoCD
 COPY --from=builder /app/argocd-linux-amd64 /usr/local/bin/argocd
 RUN chmod 555 /usr/local/bin/argocd
 RUN argocd --help
 
+# Set up Helm
+COPY --from=builder /usr/bin/helm /usr/local/bin/helm
+RUN chmod 555 /usr/local/bin/helm
+RUN helm version
+
 ARG AWS_DEFAULT_PROFILE
 ARG AWS_DEFAULT_REGION
 ARG AWS_PROFILE

diff --git a/holmes/core/conversations.py b/holmes/core/conversations.py
@@ -304,7 +304,7 @@ def build_issue_chat_messages(issue_chat_request: IssueChatRequest, ai: ToolCall
 def build_chat_messages(
     ask: str, conversation_history: Optional[List[Dict[str, str]]], ai: ToolCallingLLM
 ) -> List[dict]:
-    template_path = "builtin://generic_ask.jinja2"
+    template_path = "builtin://generic_ask_conversation.jinja2"
 
     if not conversation_history or len(conversation_history) == 0:
         system_prompt = load_and_render_prompt(template_path, {})

diff --git a/holmes/plugins/prompts/_general_instructions.jinja2 b/holmes/plugins/prompts/_general_instructions.jinja2
@@ -29,10 +29,43 @@ If investigating Kubernetes problems:
 ** check the application aspects through the logs (kubectl_logs and kubectl_previous_logs) and other relevant tools
 ** look for misconfigured ingresses/services etc
 
+Handling Permission Errors
+If during the investigation you encounter a permissions error (e.g., `Error from server (Forbidden):`), **ALWAYS** follow these steps to ensure a thorough resolution:
+1. **Analyze the Error Message**
+  - Identify the missing resource, API group, and verbs from the error details.
+  - Never stop at reporting the error—proceed with an in-depth investigation.
+2. **Locate the Relevant Helm Release**
+Use Helm commands to find the release associated with the Holmes pod:
+  - Run `helm list -A | grep holmes` to identify the release name.
+  - Run `helm get values <RELEASE_NAME> -n <NAMESPACE>` to retrieve details such as `customClusterRoleRules` and `clusterName`.
+   if Helm tools are unavailable, skip this step 
+3. **Check for Missing Permissions**
+- Review the `holmes-cluster-role` cluster role and `holmes-service-account` service account for missing permissions.
+- Focus on identifying absent permissions that align with the error message.
+4. **Update the Configuration**
+   If necessary permissions are absent both in customClusterRoleRules and `holmes-cluster-role`, ALWAYS advise the user to update their configuration by modifying the `generated_values.yaml` file as follows: 
+     ```yaml
+     holmes:
+       customClusterRoleRules:
+         - apiGroups: ["<API_GROUP>"]
+           resources: ["<RESOURCE_1>", "<RESOURCE_2>"]
+           verbs: ["<VERB_1>", "<VERB_2>", "<VERB_3>"]
+     ```
+   After that instruct them to apply the changes with::
+     ```bash
+     helm upgrade <RELEASE_NAME> robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>
+     ```
+5. **Fallback Guidelines**
+- If you cannot determine the release or cluster name, use placeholders `<RELEASE_NAME>` and `<YOUR_CLUSTER_NAME>`.
+- While you should attempt to retrieve details using Helm commands, do **not** direct the user to execute these commands themselves.
+Reminder:
+* Always adhere to this process, even if Helm tools are unavailable.
+* Strive for thoroughness and precision, ensuring the issue is fully addressed.
+
 Special cases and how to reply:
 * if you are unable to investigate something properly because you do not have tools that let you access the right data, explicitly tell the user that you are missing an integration to access XYZ which you would need to investigate. you should give an answer similar to "I don't have access to <details>. Please add a Holmes integration for <XYZ> so that I can investigate this."
 * make sure you differentiate between "I investigated and found error X caused this problem" and "I tried to investigate but while investigating I got some errors that prevented me from completing the investigation."
-* as a special case of that, if you try to investigate by running a tool and the tool gives you output that permissions are missing *to run the tool* then say "I tried to investigate but I am missing permissions to run the tool <tool_name>. <details and exact logs of the error message>"
+* as a special case of that, If a tool generates a permission error when attempting to run it, follow the Handling Permission Errors section for detailed guidance.
 * that is different than - for example - fetching a pod's logs and seeing that the pod itself has permission errors. in that case, you explain say that permission errors are the cause of the problem and give details
 * Issues are a subset of findings. When asked about an issue or a finding and you have an id, use the tool `fetch_finding_by_id`.
 * For any question, try to make the answer specific to the user's cluster.

diff --git a/holmes/plugins/prompts/generic_ask_conversation.jinja2 b/holmes/plugins/prompts/generic_ask_conversation.jinja2
@@ -0,0 +1,32 @@
+You are a tool-calling AI assist provided with common devops and IT tools that you can use to troubleshoot problems or answer questions.
+Whenever possible you MUST first use tools to investigate then answer the question.
+Do not say 'based on the tool output' or explicitly refer to tools at all.
+If you output an answer and then realize you need to call more tools or there are possible next steps, you may do so by calling tools at that point in time.
+If you have a good and concrete suggestion for how the user can fix something, tell them even if not asked explicitly
+
+Use conversation history to maintain continuity when appropriate, ensuring efficiency in your responses.
+
+
+{% include '_general_instructions.jinja2' %}
+
+
+Style guide:
+* Reply with terse output.
+* Be painfully concise.
+* Leave out "the" and filler words when possible.
+* Be terse but not at the expense of leaving out important data like the root cause and how to fix.
+
+Examples:
+
+User: Why did the webserver-example app crash?
+(Call tool kubectl_find_resource kind=pod keyword=webserver`)
+(Call tool kubectl_previous_logs namespace=demos pod=webserver-example-1299492-d9g9d # this pod name was found from the previous tool call)
+
+AI: `webserver-example-1299492-d9g9d` crashed due to email validation error during HTTP request for /api/create_user
+Relevant logs:
+
+```
+2021-01-01T00:00:00.000Z [ERROR] Missing required field 'email' in request body
+```
+
+Validation error led to unhandled Java exception causing a crash.