Skip to content

Alert Destinations

Alert destinations allow health checks to send notifications when checks fail or encounter errors. Destinations are configured in health check resources and support multiple notification platforms.

Overview

Destinations are used when a HealthCheck or ScheduledHealthCheck has mode: alert. When a check fails or returns an error, the Holmes Operator sends notifications to all configured destinations.

Key Concepts:

  • Destinations are defined per-check in the destinations field
  • Multiple destinations can be configured for a single check
  • Global credentials can be set via environment variables (recommended)
  • Per-check credentials can override global settings

Supported Destinations

Holmes Operator currently supports two alert destination types:

  • Slack - Send formatted messages to Slack channels
  • PagerDuty - Create incidents in PagerDuty

Slack Destination

Send rich, formatted notifications to Slack channels when health checks fail.

Prerequisites

  • Slack workspace access
  • Slack Bot Token (starts with xoxb-)

Configuration

Option 1: Global Slack Token (Recommended)

Configure the Slack token globally for all checks (On Holmes, not Operator):

# values.yaml
additionalEnvVars:
  - name: SLACK_TOKEN
    value: "xoxb-your-slack-bot-token"
  # Or from a secret (recommended for production):
  # - name: SLACK_TOKEN
  #   valueFrom:
  #     secretKeyRef:
  #       name: holmes-secrets
  #       key: slack-token

With a global token, checks only need to specify the channel:

apiVersion: holmesgpt.dev/v1alpha1
kind: HealthCheck
metadata:
  name: production-check
spec:
  query: "Are all production pods healthy?"
  mode: alert
  destinations:
    - type: slack
      config:
        channel: "#production-alerts"

Option 2: Per-Check Token

Specify the token directly in the check configuration:

apiVersion: holmesgpt.dev/v1alpha1
kind: HealthCheck
metadata:
  name: production-check
spec:
  query: "Are all production pods healthy?"
  mode: alert
  destinations:
    - type: slack
      config:
        token: "xoxb-your-slack-bot-token"  # Not recommended
        channel: "#production-alerts"

Security Best Practice

Always use global environment variables loaded from Kubernetes secrets rather than hardcoding tokens in check resources.

Configuration Fields

channel (string, required)

Slack channel to post messages to. Must include the # prefix.

  • Example: "#alerts", "#production-alerts"

token (string, optional)

Slack Bot Token. If not provided, uses the global SLACK_TOKEN environment variable.

  • Format: xoxb-*
  • Source: Slack App OAuth Token

Message Format

Example Slack message:

Example Slack alert generated by Holmes Operator

PagerDuty Destination

Create incidents in PagerDuty when health checks fail, with full context and analysis.

Prerequisites

  • PagerDuty account
  • PagerDuty Events API v2 Integration Key
  • Service configured in PagerDuty

Getting an Integration Key

  1. In PagerDuty, navigate to Services > Service Directory
  2. Select your service (or create a new one)
  3. Go to the Integrations tab
  4. Click Add Integration
  5. Select Events API V2
  6. Copy the Integration Key (starts with routing key format)

Configuration

Option 1: Global Integration Key (Recommended)

Configure globally for all checks:

# values.yaml
additionalEnvVars:
  - name: PAGERDUTY_INTEGRATION_KEY
    value: "your-integration-key"
  # Or from a secret (recommended):
  # - name: PAGERDUTY_INTEGRATION_KEY
  #   valueFrom:
  #     secretKeyRef:
  #       name: holmes-secrets
  #       key: pagerduty-key

Checks can then omit the integration key:

apiVersion: holmesgpt.dev/v1alpha1
kind: HealthCheck
metadata:
  name: critical-service-check
spec:
  query: "Is the payment service healthy and processing transactions?"
  mode: alert
  destinations:
    - type: pagerduty
      config: {}  # Uses global integration key

Option 2: Per-Check Integration Key

Specify the key directly in the check:

apiVersion: holmesgpt.dev/v1alpha1
kind: HealthCheck
metadata:
  name: critical-service-check
spec:
  query: "Is the payment service healthy?"
  mode: alert
  destinations:
    - type: pagerduty
      config:
        integration_key: "your-integration-key"

Configuration Fields

integration_key (string, optional)

PagerDuty Events API v2 integration key for the target service.

  • If not provided, uses the global PAGERDUTY_INTEGRATION_KEY environment variable

api_url (string, optional)

PagerDuty Events API endpoint. Rarely needs to be changed.

  • Default: https://events.pagerduty.com/v2/enqueue
  • Only override for custom PagerDuty instances

Incident Details

PagerDuty incidents created by Holmes include:

Basic Information: - Summary: Holmes Check Failed: <check-name> - Severity: error (or critical, warning, info based on check tags) - Source: holmes - Component: Check source type - Group: health-checks

Custom Details: - holmes_analysis: Full LLM reasoning and determination - source_type: Where the check ran (e.g., kubernetes) - check_details: Raw check data and context - tools_used: List of data sources queried by the AI

Deduplication: - Incidents are deduplicated using check ID - Multiple failures of the same check update the existing incident

Links: - If the check has an associated URL, it's included as a link in the incident

Notification Status

Check notification delivery status:

# View notification status for a check
kubectl get hc <name> -o jsonpath='{.status.notifications}'

# Example output:
# [
#   {
#     "type": "slack",
#     "channel": "#alerts",
#     "status": "sent"
#   },
#   {
#     "type": "pagerduty",
#     "status": "sent"
#   }
# ]

Possible notification statuses:

  • sent - Notification delivered successfully
  • failed - Notification delivery failed (check operator logs)
  • skipped - Notification not sent (e.g., check passed)

Next Steps