Skills¶
Breaking Change — Holmes 0.25.0+
Skills replace the previous runbook system. If you are upgrading from Holmes 0.24.x or older, you must migrate your runbooks to the new SKILL.md format. See Migrating from Runbooks below.
Skills are step-by-step troubleshooting guides that Holmes follows when investigating issues. When a user asks a question or an alert fires, Holmes automatically matches relevant skills from its catalog and fetches them using the fetch_skill tool. It then follows the skill instructions step-by-step, calling tools to gather data and reporting results for each step.
Skills work with all Holmes interfaces — the CLI (ask and investigate commands), the HTTP server, and the Python SDK.
How It Works¶
- Holmes receives a question or alert
- Holmes compares the issue against skill descriptions in the catalog
- If a skill matches, Holmes fetches it with the
fetch_skilltool - Holmes follows the skill steps, calling tools to gather data at each step
- Holmes reports findings with a checklist showing completed and skipped steps
Built-in Skills¶
Holmes ships with built-in skills at holmes/plugins/skills/builtin/. These are available automatically — no configuration needed.
Custom Skills¶
You can add your own skills by creating SKILL.md files and pointing Holmes to them.
Skill Format¶
Each skill is a directory containing a SKILL.md file with YAML frontmatter and a markdown body:
my-skills/
├── dns-troubleshooting/
│ └── SKILL.md
├── postgres-performance/
│ └── SKILL.md
└── redis-connection-issues/
└── SKILL.md
dns-troubleshooting/SKILL.md:
---
name: dns-troubleshooting
description: Troubleshooting DNS resolution failures in Kubernetes clusters
---
# DNS Troubleshooting
## Goal
Diagnose and resolve DNS resolution issues in the cluster.
Follow the workflow steps sequentially.
## Workflow
1. **Check CoreDNS pods**
* Verify pods in kube-system with label k8s-app=kube-dns are running
* Check for restarts or resource pressure
2. **Test DNS resolution**
* Resolve kubernetes.default.svc.cluster.local from an affected pod
* Resolve an external domain like google.com
3. **Check for NetworkPolicies blocking DNS**
* List NetworkPolicies in the affected namespace
* Verify UDP port 53 egress to kube-system is allowed
## Synthesize Findings
Correlate the outputs from each step to identify the root cause.
## Recommended Remediation Steps
* **CoreDNS down**: Check resource limits and node capacity
* **NetworkPolicy blocking**: Add an egress rule allowing DNS traffic
* **ConfigMap wrong**: Fix the Corefile and restart CoreDNS
Frontmatter Fields¶
name(optional): Lowercase with hyphens. Defaults to the parent directory name.description(required): Used by the LLM to match the skill to user questions — make this descriptive.
Writing a Skill¶
The key sections in a skill's markdown body are:
- Goal: What the skill addresses
- Workflow: Sequential diagnostic steps Holmes will execute using its tools
- Synthesize Findings: How to interpret combined results
- Recommended Remediation Steps: Solutions based on findings
Configuring Custom Skill Paths¶
Add skill directory paths to ~/.holmes/config.yaml:
Mount your skill directories and reference them in values:
Holmes scans each directory (up to 2 levels deep) for SKILL.md files. Multiple paths are merged — skills from all paths are combined with built-in skills.
Common Use Cases¶
Migrating from Runbooks¶
If you are upgrading from Holmes 0.24.x or older, your existing runbooks need to be converted to the SKILL.md format.
For each runbook in your catalog:
-
Create a directory named after the runbook (lowercase, hyphens):
-
Create a
SKILL.mdfile inside it with the description from your oldcatalog.jsonentry as frontmatter, and the original markdown content as the body: -
Replace
custom_runbook_catalogsin your config withcustom_skill_paths:
The catalog.json file is no longer needed — Holmes discovers skills automatically by scanning for SKILL.md files.
Troubleshooting¶
# Check Holmes logs for skill loading errors
# Look for "Failed to parse" or "missing required 'description' field"
holmes ask "test question" -v
# Verify your SKILL.md has valid YAML frontmatter
python3 -c "
import yaml
with open('my-skill/SKILL.md') as f:
content = f.read()
parts = content.split('---', 2)
print(yaml.safe_load(parts[1]))
"