Prometheus¶

Connect HolmesGPT to Prometheus for metrics analysis and query generation.

Prerequisites¶

A running and accessible Prometheus server
Ensure HolmesGPT can connect to the Prometheus endpoint (see Finding your Prometheus URL)

Configuration¶

Holmes CLI Holmes Helm Chart Robusta Helm Chart

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
    prometheus/metrics:
        enabled: true
        subtype: prometheus
        config:
            prometheus_url: http://<your-prometheus-service>:9090

            # Optional:
            #additional_headers:
            #    Authorization: "Basic <base_64_encoded_string>"

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
    prometheus/metrics:
        enabled: true
        subtype: prometheus
        config:
            prometheus_url: http://<your-prometheus-service>:9090

            # Optional:
            #additional_headers:
            #    Authorization: "Basic <base_64_encoded_string>"

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
      prometheus/metrics:
          enabled: true
          subtype: prometheus
          config:
              prometheus_url: http://<your-prometheus-service>:9090

              # Optional:
              #additional_headers:
              #    Authorization: "Basic <base_64_encoded_string>"

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

About subtype

The subtype: field tells HolmesGPT which Prometheus variant you're connecting to. For a plain self-hosted Prometheus use prometheus; variant-specific values (victoriametrics, coralogix, grafana-cloud, aws-managed-prometheus, azure-managed-prometheus, google-managed-prometheus) are shown in the sections below. The field is optional — HolmesGPT will infer the variant from the configuration fields when you omit it — but setting it makes the resulting toolset card light up under the correct integration in the UI.

Finding your Prometheus URL¶

There are several ways to find your Prometheus URL:

Option 1: Simple method (port-forwarding)

# Find Prometheus services
kubectl get svc -A | grep prometheus

# Port forward for testing
kubectl port-forward svc/<your-prometheus-service> 9090:9090 -n <namespace>
# Then access Prometheus at: http://localhost:9090

Option 2: Advanced method (get full cluster DNS URL)

If you want to find the full internal DNS URL for Prometheus, run:

kubectl get svc --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"."}{.metadata.namespace}{".svc.cluster.local:"}{.spec.ports[0].port}{"\n"}{end}' | grep prometheus | grep -Ev 'operat|alertmanager|node|coredns|kubelet|kube-scheduler|etcd|controller' | awk '{print "http://"$1}'

This will print all possible Prometheus service URLs in your cluster. Pick the one that matches your deployment.

Multiple Instances¶

The Prometheus toolset can connect to more than one Prometheus instance. List each one under instances: with a unique name. Any config field set outside instances: becomes a default that every instance inherits, so shared settings only need to be written once.

toolsets:
  prometheus/metrics:
    enabled: true
    config:
      instances:
        - name: prod
          prometheus_url: http://<your-prometheus-service>:9090
        - name: staging
          prometheus_url: http://<your-prometheus-service>:9090

When more than one instance is configured, HolmesGPT automatically adds an instance parameter to every Prometheus tool (so it can pick which instance to query) and a prometheus_metrics_list_instances tool to list the configured instances. With a single instance — including the flat config without instances: — the tools are unchanged and fully backwards compatible.

See Multiple Instances for the full behaviour, including global defaults and health reporting.

Specific Providers¶

Coralogix Prometheus¶

To use a Coralogix PromQL endpoint with HolmesGPT:

Go to Coralogix Documentation and choose the relevant PromQL endpoint for your region.
In Coralogix, create an API key with permissions to query metrics (Data Flow → API Keys).

Create a Kubernetes secret for the API key and expose it as an environment variable in your Helm values:

holmes:
  additionalEnvVars:
    - name: CORALOGIX_API_KEY
      valueFrom:
        secretKeyRef:
          name: coralogix-api-key
          key: CORALOGIX_API_KEY

Add the following under your toolsets in the Helm chart:

holmes:
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: coralogix
      config:
        prometheus_url: "https://prom-api.eu2.coralogix.com"  # Use your region's endpoint
        additional_headers:
          token: "{{ env.CORALOGIX_API_KEY }}"
        discover_metrics_from_last_hours: 72  # Look back 72 hours for metrics
        tool_calls_return_data: true

AWS Managed Prometheus (AMP)¶

To connect HolmesGPT to AWS Managed Prometheus:

holmes:
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: aws-managed-prometheus
      config:
        prometheus_url: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/
        aws_region: us-east-1
        aws_service_name: aps  # Default value, can be omitted
        # Optional: Specify credentials (otherwise uses default AWS credential chain)
        aws_access_key: "{{ env.AWS_ACCESS_KEY_ID }}"
        aws_secret_access_key: "{{ env.AWS_SECRET_ACCESS_KEY }}"
        # Optional: Assume a role for cross-account access
        assume_role_arn: "arn:aws:iam::123456789012:role/PrometheusReadRole"
        refresh_interval_seconds: 900  # Refresh AWS credentials every 15 minutes (default)

Notes: - The toolset automatically detects AWS configuration when aws_region is present - Uses SigV4 authentication for all requests - Supports IAM roles and cross-account access via assume_role_arn - Credentials refresh automatically based on refresh_interval_seconds

Google Managed Prometheus¶

Before configuring Holmes, make sure you have:

Google Managed Prometheus enabled
A Prometheus Frontend endpoint accessible from your cluster (If you don’t already have one, you can create it following the instructions here )

To connect HolmesGPT to Google Cloud Managed Prometheus:

holmes:
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: google-managed-prometheus
      config:
        # Set this to the URL of your Prometheus Frontend endpoint, it may change based on the namespace you deployed frontend to.
        prometheus_url: http://frontend.default.svc.cluster.local:9090

Notes:

Authentication is handled automatically via Google Cloud (Workload Identity or default service account in the frontend deployed app)
No additional headers or credentials are required
The Prometheus Frontend endpoint must be accessible from the cluster

Azure Managed Prometheus¶

Before configuring Holmes, make sure you have:

An Azure Monitor workspace with Managed Prometheus enabled
A service principal (or managed identity) that has access to the workspace

Using a service principal (client secret)¶

holmes:
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: azure-managed-prometheus
      config:
        prometheus_url: "https://<your-workspace>.<region>.prometheus.monitor.azure.com:443/"
  additionalEnvVars:
    - name: AZURE_CLIENT_ID
      value: "<your-app-client-id>"
    - name: AZURE_TENANT_ID
      value: "<your-tenant-id>"
    - name: AZURE_CLIENT_SECRET
      value: "<your-client-secret>"

Notes: - prometheus_url must point to the Azure Managed Prometheus workspace endpoint (include the trailing slash). - No extra headers are required; authentication is handled through Azure AD (service principal or managed identity). - SSL is enabled by default (verify_ssl: true). Disable only if you know you need to trust a custom cert.

Grafana Cloud (Mimir)¶

There are two ways to connect HolmesGPT to Grafana Cloud's Prometheus/Mimir endpoint.

Option 1: Direct Prometheus Endpoint (Recommended)¶

Use Grafana Cloud's direct Prometheus endpoint with Basic authentication. This is the simplest approach.

Find your credentials:

Go to your Grafana Cloud portal → your stack → Prometheus card → Details
Note the remote write endpoint URL — remove the /push suffix to get the query endpoint
Note the Username / Instance ID (a numeric ID)
Generate a Cloud Access Policy token with metrics:read scope

The query endpoint URL format is: https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom

Holmes CLIHolmes Helm ChartRobusta Helm Chart

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  prometheus/metrics:
    enabled: true
    subtype: grafana-cloud
    config:
      prometheus_url: https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom
      additional_headers:
        Authorization: "Basic <base64_encoded_credentials>"

The Basic auth credentials are <instance_id>:<cloud_access_policy_token> base64-encoded.

After making changes to your configuration, run:

holmes toolset refresh

First, create a Kubernetes secret with your credentials:

# Base64-encode your credentials: <instance_id>:<cloud_access_policy_token>
kubectl create secret generic grafana-cloud-prometheus \
  --from-literal=auth-header="Basic $(echo -n 'INSTANCE_ID:CLOUD_ACCESS_POLICY_TOKEN' | base64)" \
  -n holmes

Namespace must match Holmes' deployment

Create the secret in the same namespace where Holmes runs. The -n holmes flag in the Holmes Helm tab and -n default in the Robusta Helm tab match each chart's documented defaults — adjust if you installed Holmes/Robusta into a different namespace. A secret in the wrong namespace silently resolves to an empty env var and authentication will fail with no clear error.

Then add to your Holmes Helm values:

additionalEnvVars:
  - name: GRAFANA_CLOUD_PROM_AUTH
    valueFrom:
      secretKeyRef:
        name: grafana-cloud-prometheus
        key: auth-header

toolsets:
  prometheus/metrics:
    enabled: true
    subtype: grafana-cloud
    config:
      prometheus_url: "https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom"
      additional_headers:
        Authorization: "{{ env.GRAFANA_CLOUD_PROM_AUTH }}"

First, create a Kubernetes secret with your credentials:

# Base64-encode your credentials: <instance_id>:<cloud_access_policy_token>
kubectl create secret generic grafana-cloud-prometheus \
  --from-literal=auth-header="Basic $(echo -n 'INSTANCE_ID:CLOUD_ACCESS_POLICY_TOKEN' | base64)" \
  -n default

Namespace must match Holmes' deployment

Create the secret in the same namespace where Holmes runs. The -n holmes flag in the Holmes Helm tab and -n default in the Robusta Helm tab match each chart's documented defaults — adjust if you installed Holmes/Robusta into a different namespace. A secret in the wrong namespace silently resolves to an empty env var and authentication will fail with no clear error.

Then add to your Robusta Helm values:

holmes:
  additionalEnvVars:
    - name: GRAFANA_CLOUD_PROM_AUTH
      valueFrom:
        secretKeyRef:
          name: grafana-cloud-prometheus
          key: auth-header
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: grafana-cloud
      config:
        prometheus_url: "https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom"
        additional_headers:
          Authorization: "{{ env.GRAFANA_CLOUD_PROM_AUTH }}"

Update your Helm values and run a Helm upgrade:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Option 2: Grafana API Proxy¶

Use Grafana's datasource proxy to route requests through the Grafana API. This approach uses a Grafana service account token.

Find your credentials:

Navigate to "Administration → Service accounts" in Grafana Cloud
Create a new service account and generate a token (starts with glsa_)
Find your Prometheus datasource UID:

curl -H "Authorization: Bearer YOUR_GLSA_TOKEN" \
     "https://YOUR-INSTANCE.grafana.net/api/datasources" | \
     jq '.[] | select(.type=="prometheus") | {name, uid}'

Holmes CLIHolmes Helm ChartRobusta Helm Chart

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  prometheus/metrics:
    enabled: true
    subtype: grafana-cloud
    config:
      prometheus_url: https://YOUR-INSTANCE.grafana.net/api/datasources/proxy/uid/PROMETHEUS_DATASOURCE_UID
      additional_headers:
        Authorization: Bearer YOUR_GLSA_TOKEN

After making changes to your configuration, run:

holmes toolset refresh

First, create a Kubernetes secret with your service account token:

kubectl create secret generic grafana-cloud-sa-token \
  --from-literal=token=YOUR_GLSA_TOKEN \
  -n holmes

Namespace must match Holmes' deployment

Create the secret in the same namespace where Holmes runs. The -n holmes flag in the Holmes Helm tab and -n default in the Robusta Helm tab match each chart's documented defaults — adjust if you installed Holmes/Robusta into a different namespace. A secret in the wrong namespace silently resolves to an empty env var and authentication will fail with no clear error.

Then add to your Holmes Helm values:

additionalEnvVars:
  - name: GRAFANA_CLOUD_SA_TOKEN
    valueFrom:
      secretKeyRef:
        name: grafana-cloud-sa-token
        key: token

toolsets:
  prometheus/metrics:
    enabled: true
    subtype: grafana-cloud
    config:
      prometheus_url: "https://YOUR-INSTANCE.grafana.net/api/datasources/proxy/uid/PROMETHEUS_DATASOURCE_UID"
      additional_headers:
        Authorization: "Bearer {{ env.GRAFANA_CLOUD_SA_TOKEN }}"

First, create a Kubernetes secret with your service account token:

kubectl create secret generic grafana-cloud-sa-token \
  --from-literal=token=YOUR_GLSA_TOKEN \
  -n default

Namespace must match Holmes' deployment

Create the secret in the same namespace where Holmes runs. The -n holmes flag in the Holmes Helm tab and -n default in the Robusta Helm tab match each chart's documented defaults — adjust if you installed Holmes/Robusta into a different namespace. A secret in the wrong namespace silently resolves to an empty env var and authentication will fail with no clear error.

Then add to your Robusta Helm values:

holmes:
  additionalEnvVars:
    - name: GRAFANA_CLOUD_SA_TOKEN
      valueFrom:
        secretKeyRef:
          name: grafana-cloud-sa-token
          key: token
  toolsets:
    prometheus/metrics:
      enabled: true
      subtype: grafana-cloud
      config:
        prometheus_url: "https://YOUR-INSTANCE.grafana.net/api/datasources/proxy/uid/PROMETHEUS_DATASOURCE_UID"
        additional_headers:
          Authorization: "Bearer {{ env.GRAFANA_CLOUD_SA_TOKEN }}"

Update your Helm values and run a Helm upgrade:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Advanced Configuration¶

You can further customize the Prometheus toolset with the following options:

toolsets:
  prometheus/metrics:
    enabled: true
    subtype: prometheus
    config:
      prometheus_url: http://prometheus-server.monitoring.svc.cluster.local:9090
      additional_headers:
        Authorization: "Basic <base64_encoded_credentials>"

      # Discovery settings
      discover_metrics_from_last_hours: 1  # Only return metrics with data in last N hours (default: 1)

      # Timeout configuration
      query_timeout_seconds_default: 20  # Default timeout for PromQL queries (default: 20)
      query_timeout_seconds_hard_max: 180  # Maximum allowed timeout for PromQL queries (default: 180)
      metadata_timeout_seconds_default: 20  # Default timeout for metadata/discovery APIs (default: 20)
      metadata_timeout_seconds_hard_max: 60  # Maximum allowed timeout for metadata APIs (default: 60)

      # Other options
      rules_cache_duration_seconds: 1800  # Cache duration for Prometheus rules (default: 30 minutes)
      verify_ssl: true  # Enable SSL verification (default: true)
      tool_calls_return_data: true  # If false, disables returning Prometheus data (default: true)
      additional_labels:  # Additional labels to add to all queries
        cluster: "production"

Configuration options:

subtype is set at the toolset level (sibling of enabled: and config:); the rest of the fields below go inside config:.

Option	Default	Description
`subtype`	(inferred)	Top-level field — picks the Prometheus variant. One of `prometheus`, `victoriametrics`, `coralogix`, `grafana-cloud`, `aws-managed-prometheus`, `azure-managed-prometheus`, `google-managed-prometheus`. Setting it is recommended; if omitted, HolmesGPT infers the variant from configuration fields (`aws_region` → AMP, Azure-specific fields → Azure) for backwards compatibility.
`prometheus_url`	(required for variants other than `prometheus`)	Prometheus server URL (include protocol and port). For `subtype: prometheus` this is optional — if omitted, HolmesGPT auto-detects via the `PROMETHEUS_URL` env var or in-cluster service discovery.
`additional_headers`	per `subtype`	Authentication headers (e.g., `Authorization: Bearer token`). Variant defaults: empty for most; `coralogix` defaults to `{token: "{{ env.CORALOGIX_API_KEY }}"}`; `grafana-cloud` defaults to `{Authorization: "Basic {{ env.GRAFANA_CLOUD_AUTH }}"}`.
`discover_metrics_from_last_hours`	`1` (`72` for `coralogix`)	Only discover metrics with data in last N hours
`query_timeout_seconds_default`	`20`	Default PromQL query timeout
`query_timeout_seconds_hard_max`	`180`	Maximum query timeout
`metadata_timeout_seconds_default`	`20`	Default metadata/discovery API timeout
`metadata_timeout_seconds_hard_max`	`60`	Maximum metadata API timeout
`rules_cache_duration_seconds`	`1800`	Cache duration for rules (set to `null` to disable)
`verify_ssl`	`true` (`false` for `aws-managed-prometheus`)	Enable SSL certificate verification
`tool_calls_return_data`	`true`	Return Prometheus data (disable if hitting token limits)
`additional_labels`	`{}`	Labels to add to all queries

Variant-specific fields¶

These fields are only valid for specific subtype values:

Field	`subtype`	Required	Description
`aws_region`	`aws-managed-prometheus`	Yes	AWS region (e.g. `us-east-1`)
`aws_access_key` / `aws_secret_access_key`	`aws-managed-prometheus`	No	Falls back to default AWS credential chain when omitted
`aws_service_name`	`aws-managed-prometheus`	No (default `aps`)	AWS service name for SigV4
`assume_role_arn`	`aws-managed-prometheus`	No	IAM role to assume for cross-account access
`refresh_interval_seconds`	`aws-managed-prometheus`, `azure-managed-prometheus`	No (default `900`)	How often to refresh credentials — AWS STS creds for AMP, Azure AD bearer token for Azure
`azure_client_id` / `azure_client_secret` / `azure_tenant_id`	`azure-managed-prometheus`	UI-required	Azure AD service principal. CLI/Helm: omit to use managed identity (`azure_use_managed_id: true`) or `AZURE_*` env vars.
`azure_use_managed_id`	`azure-managed-prometheus`	No (default `false`)	Set `true` to use Azure managed identity instead of a service principal
`azure_resource` / `azure_metadata_endpoint` / `azure_token_endpoint`	`azure-managed-prometheus`	No	Azure AD endpoints — sensible defaults are applied automatically

Capabilities¶

Tool Name	Description
list_prometheus_rules	List all defined Prometheus rules with descriptions and annotations
get_metric_names	Get list of metric names (fastest discovery method) - requires match filter
get_label_values	Get all values for a specific label (e.g., pod names, namespaces)
get_all_labels	Get list of all label names available in Prometheus
get_series	Get time series matching a selector (returns full label sets)
get_metric_metadata	Get metadata (type, description, unit) for metrics
execute_prometheus_instant_query	Execute an instant PromQL query (single point in time)
execute_prometheus_range_query	Execute a range PromQL query for time series data with graph generation