Skip to content

Datadog

Connect HolmesGPT to Datadog for comprehensive observability including logs, metrics, traces, and more.

Quick Start

1. Get Your API Keys and Site URL

You'll need two keys and your site URL from your Datadog account:

  • API Key: Found under Organization Settings > API Keys (disable 'Remote Config' when creating)
  • Application Key: Found under Organization Settings > Application Keys
  • Site URL: Your Datadog site endpoint

2. Configure HolmesGPT

Set environment variables:

export DD_API_KEY="your-datadog-api-key"
export DD_APP_KEY="your-datadog-app-key"

Add to your config file:

toolsets:
  # Enable all Datadog toolsets
  datadog/logs:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com  # Change for EU/other regions

  datadog/metrics:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

  datadog/traces:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

  datadog/general:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

First, create a Kubernetes secret with your API keys:

kubectl create secret generic holmes-datadog-secrets \
  --from-literal=dd-api-key=your-datadog-api-key \
  --from-literal=dd-app-key=your-datadog-app-key

Then add to your Holmes Helm values:

# Load API keys from secret
additionalEnvVars:
  - name: DD_API_KEY
    valueFrom:
      secretKeyRef:
        name: holmes-datadog-secrets
        key: dd-api-key
  - name: DD_APP_KEY
    valueFrom:
      secretKeyRef:
        name: holmes-datadog-secrets
        key: dd-app-key

toolsets:
  # Enable all Datadog toolsets
  datadog/logs:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com  # Change for EU/other regions

  datadog/metrics:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

  datadog/traces:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

  datadog/general:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://app.datadoghq.com

First, create a Kubernetes secret with your API keys:

kubectl create secret generic holmes-datadog-secrets \
  --from-literal=dd-api-key=your-datadog-api-key \
  --from-literal=dd-app-key=your-datadog-app-key

Then add to your Robusta Helm values:

holmes:
  # Load API keys from secret
  additionalEnvVars:
    - name: DD_API_KEY
      valueFrom:
        secretKeyRef:
          name: holmes-datadog-secrets
          key: dd-api-key
    - name: DD_APP_KEY
      valueFrom:
        secretKeyRef:
          name: holmes-datadog-secrets
          key: dd-app-key

  toolsets:
    # Enable all Datadog toolsets
    datadog/logs:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://app.datadoghq.com  # Change for EU/other regions

    datadog/metrics:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://app.datadoghq.com

    datadog/traces:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://app.datadoghq.com

    datadog/general:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://app.datadoghq.com

3. Test It Works

# Test logs
holmes ask "show me recent logs from Datadog"

# Test metrics
holmes ask "list available Datadog metrics"

# Test general API
holmes ask "list Datadog monitors"

That's it! You're now connected to Datadog with all toolsets enabled.

Available Toolsets

HolmesGPT provides four specialized Datadog toolsets:

Toolset Purpose Common Use Cases
datadog/logs Query application logs Debugging errors, tracking deployments, historical analysis
datadog/metrics Access performance metrics CPU/memory monitoring, custom metrics, SLI tracking
datadog/traces Analyze distributed traces Latency issues, service dependencies, bottlenecks
datadog/general Access other Datadog APIs Monitors, dashboards, SLOs, incidents, synthetics

Toolset Details

Datadog Logs

Query and analyze logs from Datadog, including historical data from terminated pods.

Configuration

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  datadog/logs:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional: Log search configuration
      indexes: ["*"]  # Log indexes to search (default: ["*"])
      compact_logs: True # Reduces log metadata and tags to save LLM context space.
      storage_tiers: ["indexes"]  # Options: indexes, online-archives, flex
      default_limit: 150  # Max logs to retrieve in a query.

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  datadog/logs:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional: Log search configuration
      indexes: ["*"]  # Log indexes to search (default: ["*"])
      compact_logs: True # Reduces log metadata and tags to save LLM context space.
      storage_tiers: ["indexes"]  # Options: indexes, online-archives, flex
      default_limit: 150  # Max logs to retrieve in a query.

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    datadog/logs:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://api.datadoghq.com
        request_timeout: 60  # Timeout in seconds (default: 60)

        # Optional: Log search configuration
        indexes: ["*"]  # Log indexes to search (default: ["*"])
        compact_logs: True # Reduces log metadata and tags to save LLM context space.
        storage_tiers: ["indexes"]  # Options: indexes, online-archives, flex
        default_limit: 150  # Max logs to retrieve in a query.

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Capabilities

Tool Description
fetch_datadog_logs Retrieve logs with time range and search query

Example Usage

# Get logs for a specific pod
holmes ask "show me logs for pod payment-service in namespace production"

# Search for errors in the last hour
holmes ask "find all error logs in the last hour"

# Historical logs from deleted pods
holmes ask "show me logs from the crashed pod that was running yesterday"

Datadog Metrics

Access and analyze metrics from your infrastructure and applications.

Configuration

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  datadog/metrics:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional
      default_limit: 1000  # Max data points to retrieve (default: 1000)

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  datadog/metrics:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional
      default_limit: 1000  # Max data points to retrieve (default: 1000)

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    datadog/metrics:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://api.datadoghq.com
        request_timeout: 60  # Timeout in seconds (default: 60)

        # Optional
        default_limit: 1000  # Max data points to retrieve (default: 1000)

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Capabilities

Tool Description
list_active_datadog_metrics List metrics that have reported data in the last 24 hours
query_datadog_metrics Query specific metrics with aggregation and filtering
get_datadog_metric_metadata Get metadata about available metrics
list_datadog_metric_tags List available tags and aggregations for a specific metric

Example Usage

# List available metrics
holmes ask "what metrics are available for my application?"

# Query CPU usage
holmes ask "show me CPU usage for the payment service over the last 6 hours"

# Custom application metrics
holmes ask "analyze the payment_processing_time metric for anomalies"

Datadog Traces

Analyze distributed traces to identify performance bottlenecks and latency issues.

Configuration

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  datadog/traces:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  datadog/traces:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    datadog/traces:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://api.datadoghq.com
        request_timeout: 60  # Timeout in seconds (default: 60)

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Capabilities

Tool Description
fetch_datadog_spans Search for spans using span syntax with wildcards and filters
aggregate_datadog_spans Aggregate spans into buckets and compute metrics and timeseries

Example Usage

# Find slow requests
holmes ask "find traces where the checkout service took longer than 5 seconds"

# Analyze specific trace
holmes ask "analyze trace ID abc123 for performance issues"

# Service dependencies
holmes ask "show me traces involving both payment and inventory services"

Datadog General

Access general-purpose Datadog API endpoints for read-only operations including monitors, dashboards, SLOs, incidents, synthetics, and more.

Configuration

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  datadog/general:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional
      max_response_size: 10485760  # Max response size in bytes (default: 10MB)
      allow_custom_endpoints: false  # Allow non-whitelisted endpoints (default: false)

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  datadog/general:
    enabled: true
    config:
      dd_api_key: "{{ env.DD_API_KEY }}"
      dd_app_key: "{{ env.DD_APP_KEY }}"
      site_api_url: https://api.datadoghq.com
      request_timeout: 60  # Timeout in seconds (default: 60)

      # Optional
      max_response_size: 10485760  # Max response size in bytes (default: 10MB)
      allow_custom_endpoints: false  # Allow non-whitelisted endpoints (default: false)

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    datadog/general:
      enabled: true
      config:
        dd_api_key: "{{ env.DD_API_KEY }}"
        dd_app_key: "{{ env.DD_APP_KEY }}"
        site_api_url: https://api.datadoghq.com
        request_timeout: 60  # Timeout in seconds (default: 60)

        # Optional
        max_response_size: 10485760  # Max response size in bytes (default: 10MB)
        allow_custom_endpoints: false  # Allow non-whitelisted endpoints (default: false)

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

Capabilities

Tool Description
datadog_api_get Perform GET requests to whitelisted Datadog API endpoints
datadog_api_post_search Perform POST search operations on whitelisted endpoints
list_datadog_api_resources List available API resource categories and endpoints

Supported API Endpoints

The general toolset provides access to the following read-only API categories:

  • Monitors: List, search, and get monitor details
  • Dashboards: Access dashboard configurations and lists
  • SLOs: Query Service Level Objectives and their history
  • Events: Search and retrieve events
  • Incidents: Access incident details and timelines
  • Synthetics: Retrieve synthetic test results and configurations
  • Security Monitoring: Access security rules and signals
  • Service Map: Query APM services and dependencies
  • Hosts: List and get host information
  • Usage & Cost: Access usage metrics and cost estimates
  • Organizations & Teams: Query organizational structure

Example Usage

# List all monitors
holmes ask "show me all Datadog monitors"

# Get dashboard details
holmes ask "retrieve my application dashboard from Datadog"

# Check SLO status
holmes ask "what's the current status of our API availability SLO?"

# Search incidents
holmes ask "find recent incidents in Datadog"

# Get synthetic test results
holmes ask "show me the latest synthetic test results for our homepage"