Skip to content

Scheduled Health Checks

ScheduledHealthCheck resources provide recurring health check execution based on cron schedules. They automatically create HealthCheck resources at scheduled intervals, making them ideal for continuous monitoring.

What is a ScheduledHealthCheck?

A ScheduledHealthCheck is a Kubernetes Custom Resource that:

  • Creates HealthCheck resources on a cron schedule
  • Tracks execution history for recent runs
  • Maintains status of active (running) checks
  • Can be enabled/disabled without deletion
  • Follows the Kubernetes CronJob pattern
  • Records last execution time and results

Cost Management

Each scheduled execution creates at least one LLM API call. A schedule running every 5 minutes = 288 API calls per day. Start with infrequent schedules (hourly or daily) and monitor costs before increasing frequency.

Creating a Scheduled Check

The simplest ScheduledHealthCheck requires a cron schedule and a query:

apiVersion: holmesgpt.dev/v1alpha1
kind: ScheduledHealthCheck
metadata:
  name: hourly-pod-check
  namespace: default
spec:
  schedule: "0 * * * *"  # Every hour at :00
  query: "Are all pods in namespace 'default' healthy and running?"

Apply this check:

# Create the scheduled check
kubectl apply -f scheduled-check.yaml

# View status (short name: shc)
kubectl get shc

# Get detailed information
kubectl describe shc hourly-pod-check

Scheduled Check with Alerts

Send notifications when checks fail:

apiVersion: holmesgpt.dev/v1alpha1
kind: ScheduledHealthCheck
metadata:
  name: production-monitor
  namespace: production
spec:
  schedule: "*/15 * * * *"  # Every 15 minutes
  query: "Are all 'critical' labeled pods in 'production' namespace healthy?"
  timeout: 60
  mode: alert
  destinations:
    - type: slack
      config:
        channel: "#production-alerts"

Cron Schedule Syntax

Cron expressions use five fields:

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * *

Testing Schedules

Use crontab.guru to validate and understand cron expressions.

Spec Fields Reference

Required Fields

schedule (string, required)

Cron expression defining when to create health checks.

  • Must be valid cron syntax
  • Uses UTC timezone
  • Example: "*/15 * * * *" (every 15 minutes)

query (string, required)

Natural language question about system health.

  • Min length: 1 character
  • Max length: 5000 characters
  • Example: "Are all pods with label 'app=api' ready?"

Optional Fields

enabled (boolean, optional)

Whether the schedule is active.

  • Default: true
  • Set to false to disable without deleting the resource
  • Existing HealthCheck resources are not affected

timeout (integer, optional)

Maximum execution time per check in seconds.

  • Default: 30 seconds
  • Minimum: 1 second
  • Maximum: 300 seconds (5 minutes)

mode (string, optional)

Execution mode for alert delivery:

  • monitor (default): Results stored but no alerts sent
  • alert: Sends notifications to destinations on failure

model (string, optional)

Override default LLM model for all scheduled checks.

  • Example: model: "anthropic/claude-sonnet-4-5-20250929"
  • See AI Providers for options

destinations (array, optional)

Alert destinations (only used with mode: alert).

Example:

destinations:
  - type: slack
    config:
      channel: "#alerts"

Status Fields

Execution Tracking

lastScheduleTime (timestamp)

ISO 8601 timestamp of the most recent scheduled execution.

lastSuccessfulTime (timestamp)

ISO 8601 timestamp of the most recent successful (pass) execution.

lastResult (string)

Result of the most recent execution:

  • pass: Check passed
  • fail: Check failed
  • error: Execution error

message (string)

Brief message from the most recent execution.

Active Checks

active (array)

List of currently running HealthCheck resources created by this schedule:

active:
  - name: hourly-pod-check-20240101-120000-abc123
    namespace: default
    uid: 12345-67890
    startTime: "2024-01-01T12:00:00Z"

Execution History

history (array)

Recent execution records (limited to maxHistoryItems from operator config, default 10):

history:
  - executionTime: "2024-01-01T12:00:00Z"
    result: pass
    duration: 2.5
    checkName: hourly-pod-check-20240101-120000-abc123
    message: "All pods healthy"
  - executionTime: "2024-01-01T11:00:00Z"
    result: pass
    duration: 3.1
    checkName: hourly-pod-check-20240101-110000-def456
    message: "All pods healthy"

Conditions

Standard Kubernetes conditions:

conditions:
  - type: ScheduleRegistered
    status: "True"
    lastTransitionTime: "2024-01-01T10:00:00Z"
    reason: ScheduleActive
    message: "Schedule successfully registered"

Managing Schedules

Viewing Schedules

List all scheduled checks:

# Using full name
kubectl get scheduledhealthchecks -n default

# Using short name
kubectl get shc -n default

# All namespaces
kubectl get shc --all-namespaces

View detailed status:

# Full details including history
kubectl describe shc hourly-pod-check

# Get as YAML
kubectl get shc hourly-pod-check -o yaml

Enabling and Disabling

Temporarily disable a schedule:

kubectl patch shc hourly-pod-check --type='merge' -p '{"spec":{"enabled":false}}'

Re-enable a schedule:

kubectl patch shc hourly-pod-check --type='merge' -p '{"spec":{"enabled":true}}'

Note

Disabling a schedule stops future executions but does not affect currently running checks. Existing HealthCheck resources remain.

Updating Schedule

Change the cron schedule:

kubectl patch shc hourly-pod-check --type='merge' -p '{"spec":{"schedule":"0 */2 * * *"}}'

This updates the schedule to run every 2 hours instead of hourly.

Viewing Execution History

Check recent executions:

# View history field
kubectl get shc hourly-pod-check -o jsonpath='{.status.history}' | jq

# View last result
kubectl get shc hourly-pod-check -o jsonpath='{.status.lastResult}'

# View last schedule time
kubectl get shc hourly-pod-check -o jsonpath='{.status.lastScheduleTime}'

Next Steps