Scheduled Health Checks¶
ScheduledHealthCheck resources provide recurring health check execution based on cron schedules. They automatically create HealthCheck resources at scheduled intervals, making them ideal for continuous monitoring.
What is a ScheduledHealthCheck?¶
A ScheduledHealthCheck is a Kubernetes Custom Resource that:
- Creates HealthCheck resources on a cron schedule
- Tracks execution history for recent runs
- Maintains status of active (running) checks
- Can be enabled/disabled without deletion
- Follows the Kubernetes CronJob pattern
- Records last execution time and results
Cost Management
Each scheduled execution creates at least one LLM API call. A schedule running every 5 minutes = 288 API calls per day. Start with infrequent schedules (hourly or daily) and monitor costs before increasing frequency.
Creating a Scheduled Check¶
The simplest ScheduledHealthCheck requires a cron schedule and a query:
apiVersion: holmesgpt.dev/v1alpha1
kind: ScheduledHealthCheck
metadata:
name: hourly-pod-check
namespace: default
spec:
schedule: "0 * * * *" # Every hour at :00
query: "Are all pods in namespace 'default' healthy and running?"
Apply this check:
# Create the scheduled check
kubectl apply -f scheduled-check.yaml
# View status (short name: shc)
kubectl get shc
# Get detailed information
kubectl describe shc hourly-pod-check
Scheduled Check with Alerts¶
Send notifications when checks fail:
apiVersion: holmesgpt.dev/v1alpha1
kind: ScheduledHealthCheck
metadata:
name: production-monitor
namespace: production
spec:
schedule: "*/15 * * * *" # Every 15 minutes
query: "Are all 'critical' labeled pods in 'production' namespace healthy?"
timeout: 60
mode: alert
destinations:
- type: slack
config:
channel: "#production-alerts"
Cron Schedule Syntax¶
Cron expressions use five fields:
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
* * * * *
Testing Schedules
Use crontab.guru to validate and understand cron expressions.
Spec Fields Reference¶
Required Fields¶
schedule (string, required)
Cron expression defining when to create health checks.
- Must be valid cron syntax
- Uses UTC timezone
- Example:
"*/15 * * * *"(every 15 minutes)
query (string, required)
Natural language question about system health.
- Min length: 1 character
- Max length: 5000 characters
- Example:
"Are all pods with label 'app=api' ready?"
Optional Fields¶
enabled (boolean, optional)
Whether the schedule is active.
- Default:
true - Set to
falseto disable without deleting the resource - Existing HealthCheck resources are not affected
timeout (integer, optional)
Maximum execution time per check in seconds.
- Default: 30 seconds
- Minimum: 1 second
- Maximum: 300 seconds (5 minutes)
mode (string, optional)
Execution mode for alert delivery:
monitor(default): Results stored but no alerts sentalert: Sends notifications to destinations on failure
model (string, optional)
Override default LLM model for all scheduled checks.
- Example:
model: "anthropic/claude-sonnet-4-5-20250929" - See AI Providers for options
destinations (array, optional)
Alert destinations (only used with mode: alert).
Example:
Status Fields¶
Execution Tracking¶
lastScheduleTime (timestamp)
ISO 8601 timestamp of the most recent scheduled execution.
lastSuccessfulTime (timestamp)
ISO 8601 timestamp of the most recent successful (pass) execution.
lastResult (string)
Result of the most recent execution:
pass: Check passedfail: Check failederror: Execution error
message (string)
Brief message from the most recent execution.
Active Checks¶
active (array)
List of currently running HealthCheck resources created by this schedule:
active:
- name: hourly-pod-check-20240101-120000-abc123
namespace: default
uid: 12345-67890
startTime: "2024-01-01T12:00:00Z"
Execution History¶
history (array)
Recent execution records (limited to maxHistoryItems from operator config, default 10):
history:
- executionTime: "2024-01-01T12:00:00Z"
result: pass
duration: 2.5
checkName: hourly-pod-check-20240101-120000-abc123
message: "All pods healthy"
- executionTime: "2024-01-01T11:00:00Z"
result: pass
duration: 3.1
checkName: hourly-pod-check-20240101-110000-def456
message: "All pods healthy"
Conditions¶
Standard Kubernetes conditions:
conditions:
- type: ScheduleRegistered
status: "True"
lastTransitionTime: "2024-01-01T10:00:00Z"
reason: ScheduleActive
message: "Schedule successfully registered"
Managing Schedules¶
Viewing Schedules¶
List all scheduled checks:
# Using full name
kubectl get scheduledhealthchecks -n default
# Using short name
kubectl get shc -n default
# All namespaces
kubectl get shc --all-namespaces
View detailed status:
# Full details including history
kubectl describe shc hourly-pod-check
# Get as YAML
kubectl get shc hourly-pod-check -o yaml
Enabling and Disabling¶
Temporarily disable a schedule:
Re-enable a schedule:
Note
Disabling a schedule stops future executions but does not affect currently running checks. Existing HealthCheck resources remain.
Updating Schedule¶
Change the cron schedule:
This updates the schedule to run every 2 hours instead of hourly.
Viewing Execution History¶
Check recent executions:
# View history field
kubectl get shc hourly-pod-check -o jsonpath='{.status.history}' | jq
# View last result
kubectl get shc hourly-pod-check -o jsonpath='{.status.lastResult}'
# View last schedule time
kubectl get shc hourly-pod-check -o jsonpath='{.status.lastScheduleTime}'
Next Steps¶
- Health Checks - Learn more about the underlying HealthCheck resources
- Alert Destinations - Configure Slack and PagerDuty notifications
- Configuration - Configure schedule history limits and cleanup policies