Kubernetes (MCP)¶
Which Kubernetes toolset should I use?
Holmes has three Kubernetes integrations. Most users only need the first:
- Kubernetes (built-in) — default for most users. Read-only access to cluster resources via
kubectl, authenticated with the pod's ServiceAccount in-cluster or your local kubeconfig for CLI. No extra deployment. - Kubernetes (MCP) — use when you need OAuth/OIDC authentication (e.g. AKS with Microsoft Entra ID, or per-user RBAC enforced by your identity provider). Replaces the built-in toolset.
- Kubernetes Remediation (MCP) — add on top of either of the above when you want Holmes to perform write actions (restart, scale, drain, patch, etc.). Complements the read-only toolsets rather than replacing them.
The Kubernetes MCP server gives Holmes access to Kubernetes clusters via the MCP protocol, with support for OAuth/OIDC authentication. It is intended to replace the built-in kubernetes/core and kubernetes/logs toolsets — the Helm examples below disable those to avoid overlap.
Which setup do I need?¶
| Mode | Clusters Holmes can see | Authentication | Best for |
|---|---|---|---|
| Single Cluster | Just the cluster Holmes runs in | Pod's own ServiceAccount | Single-cluster setups, simplest path |
| Multiple Clusters | Many clusters from one Holmes pod | Pre-issued tokens in a mounted kubeconfig | Investigating prod + staging + dev from one place |
| Per-User Auth | One cluster, per-user identity | Each user's own SSO token (Microsoft Entra ID) | Enterprise SSO with per-user RBAC enforced on the API server |
Single Cluster (ServiceAccount)¶
The simplest setup. The MCP server runs in the same cluster it monitors and authenticates with its own ServiceAccount — no tokens to mint, no external IdP to configure.
Step 1: Deploy¶
Add the following to your values.yaml:
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: true
clusterRole: "view"
config:
readOnly: true
Add the following to your generated_values.yaml:
holmes:
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: true
clusterRole: "view"
config:
readOnly: true
Step 2: Verify¶
Multiple Clusters (Mounted Kubeconfig)¶
Use this mode when you want one Holmes pod to investigate multiple Kubernetes clusters from a single place. Holmes still runs inside one "home" cluster, but instead of using its in-pod ServiceAccount it authenticates to every target cluster (including, optionally, its home cluster) using credentials packed into a kubeconfig file you mount as a Secret. Every applicable MCP tool exposes a context argument so the LLM can pick which cluster to query for each step of an investigation.
Step 1: Generate a kubeconfig for Holmes¶
A kubeconfig is a YAML file containing three lists: clusters (API server URL + CA cert), users (credentials), and contexts (a named pairing of one cluster with one user). The k8s-mcp-server uses the context name as the cluster identifier — pick names you'd be comfortable seeing in tool calls (e.g. prod-eu, staging, dev).
For each cluster you want Holmes to access, you need a ServiceAccount with a long-lived token. Cloud auth plugins like aws-iam-authenticator, gke-gcloud-auth-plugin, and kubelogin do not work inside the MCP server pod — you must use static credentials.
Run the following against each target cluster (switch your local kubectl context first). Edit CLUSTER_NAME to a unique short name per cluster before each run.
Don't have a Holmes ServiceAccount on the target cluster yet?
Render and apply one from the Helm chart first — this gives the SA the same read-only role Holmes normally runs with (nodes, metrics, RBAC inspection, Prometheus CRDs, no Secrets):
helm template robusta \
https://robusta-charts.storage.googleapis.com/holmes-0.31.1.tgz \
--show-only templates/holmesgpt-service-account.yaml \
--set createServiceAccount=true \
--set k8sRBAC=false \
--namespace default > sa.yaml
kubectl apply -f sa.yaml
This creates robusta-holmes-service-account in the default namespace plus robusta-holmes-cluster-role and robusta-holmes-cluster-role-binding. Bump the chart version (0.31.1) to whatever is current.
On clusters that already have Robusta installed via Helm: kubectl apply will warn about a missing kubectl.kubernetes.io/last-applied-configuration annotation and "configure" the existing objects. The resources are functionally identical, but you've now created a co-management situation between Helm and kubectl apply. To keep them separate, change the release name in the helm template command (e.g. helm template holmes-mcp …) so it renders holmes-mcp-holmes-* resources alongside Helm's robusta-holmes-* ones. Update SA_NAME below to match.
Now mint a long-lived token for the SA and append a context to ./holmes-kubeconfig:
#!/usr/bin/env bash
set -euo pipefail
CLUSTER_NAME=prod # appears in MCP tool calls
SA_NAME=robusta-holmes-service-account # SA to mint a token for
SA_NAMESPACE=default # namespace of that SA
KUBECONFIG_OUT=./holmes-kubeconfig
TOKEN_SECRET="${SA_NAME}-mcp-token"
# Sanity-check that the SA exists.
if ! kubectl get serviceaccount "$SA_NAME" -n "$SA_NAMESPACE" >/dev/null 2>&1; then
echo "ServiceAccount $SA_NAMESPACE/$SA_NAME not found." >&2
exit 1
fi
# Create a long-lived token Secret bound to the SA (K8s 1.24+).
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: ${TOKEN_SECRET}
namespace: ${SA_NAMESPACE}
annotations:
kubernetes.io/service-account.name: ${SA_NAME}
type: kubernetes.io/service-account-token
EOF
sleep 2
TOKEN=$(kubectl get secret "$TOKEN_SECRET" -n "$SA_NAMESPACE" \
-o jsonpath='{.data.token}' | base64 -d)
CA_B64=$(kubectl get secret "$TOKEN_SECRET" -n "$SA_NAMESPACE" \
-o jsonpath='{.data.ca\.crt}')
SERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
# Write the CA to a temp file rather than process substitution
# (<(...) doesn't work under sh/dash).
CA_FILE=$(mktemp)
trap 'rm -f "$CA_FILE"' EXIT
echo "$CA_B64" | base64 -d > "$CA_FILE"
KUBECONFIG="$KUBECONFIG_OUT" kubectl config set-cluster "$CLUSTER_NAME" \
--server="$SERVER" \
--certificate-authority="$CA_FILE" \
--embed-certs=true
KUBECONFIG="$KUBECONFIG_OUT" kubectl config set-credentials "holmes-$CLUSTER_NAME" \
--token="$TOKEN"
KUBECONFIG="$KUBECONFIG_OUT" kubectl config set-context "$CLUSTER_NAME" \
--cluster="$CLUSTER_NAME" --user="holmes-$CLUSTER_NAME"
# Verify the new context can talk to the API server.
KUBECONFIG="$KUBECONFIG_OUT" kubectl --context="$CLUSTER_NAME" \
get pods -A --request-timeout=10s | head -5
Step 2: Set the default context In the Kubeconfig file¶
# List what's available — pick one name from the output
KUBECONFIG=./holmes-kubeconfig kubectl config get-contexts -o name
# Set it (replace <DEFAULT_CLUSTER> with one of the names above)
KUBECONFIG=./holmes-kubeconfig kubectl config use-context <DEFAULT_CLUSTER>
# Confirm
KUBECONFIG=./holmes-kubeconfig kubectl config current-context
Step 3: Create the kubeconfig Secret¶
kubectl create secret generic k8s-mcp-kubeconfig \
--from-file=kubeconfig=holmes-kubeconfig \
-n YOUR_NAMESPACE
To validate the secret:
kubectl get secret k8s-mcp-kubeconfig -n YOUR_NAMESPACE \
-o jsonpath='{.data.kubeconfig}' | base64 -d \
| grep -E '^current-context: \S' \
&& echo "OK: Secret populated with a default context" \
|| echo "FAIL: Secret is empty or missing current-context — fix before deploying"
Step 4: Deploy¶
Adjust your values.yaml file in the holmes "hub" cluster where you want multi-cluster access:
Add the following to your values.yaml:
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
llmInstructions: |
This MCP server provides direct access to Kubernetes clusters for advanced cluster operations and troubleshooting. This instance is connected to MULTIPLE Kubernetes clusters via kubeconfig contexts.
## MANDATORY FIRST STEP — read this before any other tool call
Before doing ANYTHING else for a Kubernetes question (resource lookup, log retrieval, event check, status check, "is X running?", "why is X failing?", etc.):
1. Call `configuration_contexts_list` to enumerate every cluster context. Do this FIRST, on every fresh investigation, even if the user named a cluster or you think you already know which cluster applies. No exceptions.
2. Treat the returned list as the complete search space. The resource the user is asking about may live on ANY of these clusters.
3. For every subsequent tool call, pass the explicit `context` argument. Never rely on an implicit default.
## Multi-cluster search procedure (when a resource is not on the first cluster)
If any resource lookup returns "not found" on a given context:
- **Immediately** re-issue the same lookup against every OTHER context returned by `configuration_contexts_list`.
- Do this WITHOUT pausing, WITHOUT asking the user "should I check other clusters?", and WITHOUT explaining what you're about to do. Just do it.
- Only after all contexts have been queried may you conclude that a resource truly does not exist.
- If a tool call against one context fails (auth, network, timeout), say so explicitly and CONTINUE with the remaining contexts. One failure must not short-circuit the search.
## Forbidden behaviors (your answer is incorrect if you do any of these)
- Skipping `configuration_contexts_list` and jumping straight to a resource query.
- Reporting "resource not found" without having queried EVERY context from `configuration_contexts_list`.
- Asking the user "should I check the other clusters?" — the answer is always yes; do it without asking.
- Assuming the first/default cluster is the only one to check.
- Omitting the cluster name from your final answer when reporting findings.
## Required output discipline
Every finding must be labeled with the cluster/context name it came from.
- Correct: "Found `payment-service` deployment on cluster **prod-eu** (3/3 ready). It does NOT exist on **prod-us** or **prod-ap**."
- Incorrect: "Found payment-service deployment, 3/3 ready." (missing cluster attribution)
## When to Use This MCP Server
Use the Kubernetes MCP when investigating:
- Pod failures, crash loops, or scheduling issues
- Resource consumption and node capacity problems
- Deployment rollout issues or scaling problems
- Kubernetes events and cluster-level diagnostics
- Helm release status and management
## Investigation Workflow
1. **List clusters FIRST** — call `configuration_contexts_list` (mandatory; see top of this document). All subsequent tool calls must include an explicit `context`.
2. **List namespaces** on the candidate cluster(s) to identify where the resource of interest could live.
3. **Check events**: look for warnings and errors. If the resource was not on the first cluster, fan out and check events on every other cluster too.
4. **Inspect pods**: get status, logs, resource usage — from the cluster where the resource actually exists.
5. **Examine resources**: get detailed definitions to identify misconfigurations.
6. **Check node health**: review node status and resource consumption on the relevant cluster.
## Important Guidelines
- Always specify BOTH the namespace AND the cluster `context` when querying namespaced resources.
- Check events first — they often reveal the root cause quickly.
- Use pod logs to understand application-level failures.
- Compare resource requests/limits with actual usage via top commands.
- When investigating scheduling issues, check node capacity and taints on the cluster where the pod lives.
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: false # auth comes from kubeconfig tokens
config:
readOnly: true
kubeconfig:
secretName: "k8s-mcp-kubeconfig"
secretKey: "kubeconfig"
# Required — overrides in-cluster auto-detection
extraArgs:
- "--kubeconfig"
- "/etc/kubernetes/kubeconfig"
- "--cluster-provider"
- "kubeconfig"
serverConfig: |
disabled_tools = ["configuration_view"]
Add the following to your generated_values.yaml:
holmes:
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
llmInstructions: |
This MCP server provides direct access to Kubernetes clusters for advanced cluster operations and troubleshooting. This instance is connected to MULTIPLE Kubernetes clusters via kubeconfig contexts.
## MANDATORY FIRST STEP — read this before any other tool call
Before doing ANYTHING else for a Kubernetes question (resource lookup, log retrieval, event check, status check, "is X running?", "why is X failing?", etc.):
1. Call `configuration_contexts_list` to enumerate every cluster context. Do this FIRST, on every fresh investigation, even if the user named a cluster or you think you already know which cluster applies. No exceptions.
2. Treat the returned list as the complete search space. The resource the user is asking about may live on ANY of these clusters.
3. For every subsequent tool call, pass the explicit `context` argument. Never rely on an implicit default.
## Multi-cluster search procedure (when a resource is not on the first cluster)
If any resource lookup returns "not found" on a given context:
- **Immediately** re-issue the same lookup against every OTHER context returned by `configuration_contexts_list`.
- Do this WITHOUT pausing, WITHOUT asking the user "should I check other clusters?", and WITHOUT explaining what you're about to do. Just do it.
- Only after all contexts have been queried may you conclude that a resource truly does not exist.
- If a tool call against one context fails (auth, network, timeout), say so explicitly and CONTINUE with the remaining contexts. One failure must not short-circuit the search.
## Forbidden behaviors (your answer is incorrect if you do any of these)
- Skipping `configuration_contexts_list` and jumping straight to a resource query.
- Reporting "resource not found" without having queried EVERY context from `configuration_contexts_list`.
- Asking the user "should I check the other clusters?" — the answer is always yes; do it without asking.
- Assuming the first/default cluster is the only one to check.
- Omitting the cluster name from your final answer when reporting findings.
## Required output discipline
Every finding must be labeled with the cluster/context name it came from.
- Correct: "Found `payment-service` deployment on cluster **prod-eu** (3/3 ready). It does NOT exist on **prod-us** or **prod-ap**."
- Incorrect: "Found payment-service deployment, 3/3 ready." (missing cluster attribution)
## When to Use This MCP Server
Use the Kubernetes MCP when investigating:
- Pod failures, crash loops, or scheduling issues
- Resource consumption and node capacity problems
- Deployment rollout issues or scaling problems
- Kubernetes events and cluster-level diagnostics
- Helm release status and management
## Investigation Workflow
1. **List clusters FIRST** — call `configuration_contexts_list` (mandatory; see top of this document). All subsequent tool calls must include an explicit `context`.
2. **List namespaces** on the candidate cluster(s) to identify where the resource of interest could live.
3. **Check events**: look for warnings and errors. If the resource was not on the first cluster, fan out and check events on every other cluster too.
4. **Inspect pods**: get status, logs, resource usage — from the cluster where the resource actually exists.
5. **Examine resources**: get detailed definitions to identify misconfigurations.
6. **Check node health**: review node status and resource consumption on the relevant cluster.
## Important Guidelines
- Always specify BOTH the namespace AND the cluster `context` when querying namespaced resources.
- Check events first — they often reveal the root cause quickly.
- Use pod logs to understand application-level failures.
- Compare resource requests/limits with actual usage via top commands.
- When investigating scheduling issues, check node capacity and taints on the cluster where the pod lives.
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: false # auth comes from kubeconfig tokens
config:
readOnly: true
kubeconfig:
secretName: "k8s-mcp-kubeconfig"
secretKey: "kubeconfig"
extraArgs:
- "--kubeconfig"
- "/etc/kubernetes/kubeconfig"
- "--cluster-provider"
- "kubeconfig"
serverConfig: |
disabled_tools = ["configuration_view"]
The llmInstructions block above helps holmes with multi-cluster awareness.
Step 5: Route chats to the right cluster (Robusta UI)¶
Only needed if you use the Robusta platform with Slack or Teams. Go to Settings → HolmesGPT → Multi-Agent Routing and fill in:
-
Routing Agent — prompt for picking the cluster from chat context. Example:
If the
cluster_nameorclusterfield is available in the chat context, route to that cluster. Otherwise, use thecluster/agent for the question.
Your "Hub" holmes instance now have access to multiple clusters.
Per-User Auth (OAuth or OIDC)¶
Use OAuth/OIDC when cluster access is managed through Microsoft Entra ID (Azure AD) — for example, enterprise environments with centralized SSO.
In this mode the MCP server validates OAuth tokens and passes them through to the Kubernetes API server, so each user's calls hit the API with their own identity. The ServiceAccount ClusterRoleBinding is not needed — permissions come from the OAuth token.
Two pieces of config drive the flow:
- Server-side (
mcpAddons.kubernetes.config.serverConfig) — TOML that the MCP server itself uses to validate incoming bearer tokens. - Holmes-side (
mcpAddons.kubernetes.config.oauth) — tells Holmes which OAuth endpoints to send users to. Without this, Holmes can't drive the browser login flow.
Step 1: Enable Azure AD on your AKS cluster¶
Your AKS cluster must be configured for Azure AD authentication. Follow the Microsoft guide to enable Azure AD integration on AKS.
Step 2: Create an Entra ID App Registration¶
- In the Azure portal, go to Microsoft Entra ID > App Registrations > New Registration
- Enter a name (e.g.,
holmes-k8s-mcp), select Accounts in this organizational directory only, and click Register -
Under Authentication > Platform configurations, add a Web platform with the redirect URI matching your Robusta region:
https://platform.robusta.dev/oauth/callback.htmlhttps://platform.eu.robusta.dev/oauth/callback.htmlhttps://platform.ap.robusta.dev/oauth/callback.html -
Under API Permissions, add the following delegated permissions:
- Azure Kubernetes Service AAD Server (
6dae42f8-4368-4678-94ff-3960e28e3630):user.read - Microsoft Graph:
email,openid,profile
- Azure Kubernetes Service AAD Server (
- Click Grant admin consent for your tenant
- Under Certificates & Secrets, create a new client secret and copy the value
- From the Overview page, note your Application (client) ID and Directory (tenant) ID
Step 3: Store the client secret¶
Create a Kubernetes Secret with the Entra ID client secret you copied in Step 2.6, then expose it on the Holmes pod as MCP_OAUTH_CLIENT_SECRET. The Helm values in Step 4 reference it via {{ env.MCP_OAUTH_CLIENT_SECRET }} so the secret never appears in your values file.
kubectl create secret generic mcp-oauth-credentials \
--from-literal=client-secret='<CLIENT_SECRET>' \
-n YOUR_NAMESPACE \
--dry-run=client -o yaml | kubectl apply -f -
Step 4: Deploy¶
Add the following to your values.yaml (replace <TENANT_ID> and <CLIENT_ID>):
# Inject the OAuth client secret as an env var that the chart reads via Jinja.
additionalEnvVars:
- name: MCP_OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: mcp-oauth-credentials
key: client-secret
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: false # No RBAC — OAuth token provides permissions
config:
readOnly: true
# Server-side: how the MCP server validates incoming JWTs.
# The chart bakes this into a Secret mounted at /etc/kubernetes-mcp/config.toml.
serverConfig: |
require_oauth = true
authorization_url = "https://login.microsoftonline.com/<TENANT_ID>/v2.0"
oauth_audience = "6dae42f8-4368-4678-94ff-3960e28e3630"
oauth_scopes = ["6dae42f8-4368-4678-94ff-3960e28e3630/.default", "openid", "profile"]
issuer_url = "https://sts.windows.net/<TENANT_ID>/"
# Holmes-side: how Holmes drives the browser OAuth flow for end users.
oauth:
enabled: true
client_id: "<CLIENT_ID>"
client_secret: "{{ env.MCP_OAUTH_CLIENT_SECRET }}"
Add the following to your generated_values.yaml (replace <TENANT_ID> and <CLIENT_ID>):
holmes:
additionalEnvVars:
- name: MCP_OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: mcp-oauth-credentials
key: client-secret
# Disable built-in k8s toolsets to avoid overlap
toolsets:
kubernetes/core:
enabled: false
kubernetes/logs:
enabled: false
bash:
enabled: false
mcpAddons:
kubernetes:
enabled: true
serviceAccount:
create: true
name: "k8s-mcp-sa"
createClusterRoleBinding: false # No RBAC — OAuth token provides permissions
config:
readOnly: true
serverConfig: |
require_oauth = true
authorization_url = "https://login.microsoftonline.com/<TENANT_ID>/v2.0"
oauth_audience = "6dae42f8-4368-4678-94ff-3960e28e3630"
oauth_scopes = ["6dae42f8-4368-4678-94ff-3960e28e3630/.default", "openid", "profile"]
issuer_url = "https://sts.windows.net/<TENANT_ID>/"
oauth:
enabled: true
client_id: "<CLIENT_ID>"
client_secret: "{{ env.MCP_OAUTH_CLIENT_SECRET }}"
Step 5: Verify¶
When you ask Holmes a Kubernetes question for the first time, the Robusta UI will open a Microsoft login window. After signing in, Holmes uses your Azure-issued token for every kubernetes_* call — RBAC is enforced per user on the API server.