Scheduling
The peel-side scheduler runs modules at configured intervals or cron times, enabling periodic automation without external orchestration. Schedules can be defined statically in peel.yaml or dynamically through the settings pipeline.
Source: pkg/schedule/
Configuration
Schedules are defined as named entries under the schedule key. Each entry specifies a module to run and a timing strategy (interval or cron).
Static Configuration (peel.yaml)
Add schedule entries directly to the peel's configuration file:
schedule:
cleanup_tmp:
module: cmd.run
args:
command: "find /tmp -mtime +7 -delete"
interval: 1h
splay: 5m
run_on_start: true
highstate:
module: state.highstate
cron: "0 */4 * * *"
splay: 15m
return_job: trueDynamic Configuration (Settings)
Schedules can also be pushed through the settings pipeline, enabling centralized management and per-peel targeting via the top file:
schedule:
disk_check:
module: cmd.run
args:
command: "df -h"
interval: 30m
return_job: trueMerge Precedence
When the same schedule name exists in both peel.yaml and settings, the settings-sourced entry takes precedence. This allows operators to override static defaults without modifying the peel configuration file.
Entry Fields
| Field | Type | Default | Description |
|---|---|---|---|
module | string | (required) | Module to execute (e.g., cmd.run, state.highstate) |
args | map | nil | Arguments passed to the module |
interval | duration | — | Repeat interval (e.g., 1h, 30m, 5s). Mutually exclusive with cron. |
cron | string | — | 5-field cron expression (e.g., 0 */4 * * *). Mutually exclusive with interval. |
splay | duration | 0 | Random delay in [0, splay) added per fire. Prevents thundering herd. |
maxrunning | int | 1 | Maximum concurrent executions of this entry. New fires are skipped if the limit is reached. |
run_on_start | bool | false | Fire immediately when the peel starts, before the first interval/cron tick. |
return_job | bool | false | Report the execution as a synthetic job, making it visible in zester job list. See Result Reporting. |
enabled | bool | true | Set to false to disable the entry without removing it. |
Interval vs Cron
Each entry must specify exactly one of interval or cron. Specifying both (or neither) is a configuration error.
Timing
Interval
The interval field accepts Go duration strings (1h, 30m, 5s, 1h30m). The scheduler fires the module repeatedly at the given interval, measured from the end of the previous execution.
Cron
The cron field accepts standard 5-field cron expressions:
┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * *Splay (Thundering Herd Prevention)
When splay is set, each fire is delayed by a random duration in [0, splay). This spreads execution across the fleet and prevents all peels from hitting shared resources simultaneously.
highstate:
module: state.highstate
cron: "0 */4 * * *"
splay: 15m # fires between :00 and :15 past the hourSplay for Large Fleets
For fleet-wide operations like state.highstate, always set a splay proportional to fleet size. A 15-minute splay with 100 peels averages one execution every 9 seconds rather than 100 simultaneous runs.
Result Reporting
By default, scheduled executions are silent -- they run on the peel without creating trackable jobs. This keeps the job system clean for operator-initiated work.
Set return_job: true to report each execution as a synthetic job. This makes scheduled runs visible in zester job list and enables:
- Result tracking and history
- Failure alerting via monitoring
- Audit trail for compliance
disk_check:
module: cmd.run
args:
command: "df -h | awk '$5+0 > 90 {print $0}'"
interval: 30m
return_job: true # results visible in zester job listHow Results Flow
Peels have no write access to the jobs or job-returns KV buckets -- scheduled results travel through JetStream instead:
- Peel publishes -- when the entry completes, the peel generates a KSUID job ID and publishes a
ScheduledResult(module, args, success, return data, duration, timestamp) on the peel-scoped subjectzester.job.<jid>.schedule.<peel-id>. - Stream captures -- the
job-eventsJetStream stream captures the message durably (7-day retention). - Master persists -- all masters share a single durable consumer named
schedule-results. It creates the synthetic job record with an idempotent KVCreate(so redeliveries and multi-master races are safe) and writes the per-peel return to thejob-returnsbucket.
The synthetic job targets exactly the reporting peel and carries the metadata source: schedule and schedule: <entry-name>; its status is complete or failed depending on the execution result.
Results survive master downtime
Because results are buffered in the job-events stream, a scheduled run that fires while no master is running is not lost -- the shared durable consumer processes it as soon as a master comes back (within the 7-day retention window).
Identity comes from the subject, not the payload
A peel's NATS permissions only allow publishing with its own ID as the trailing subject token (zester.job.*.schedule.<peel-id>). The master takes the reporting peel's identity from the subject, so one compromised peel cannot forge or overwrite another peel's job records or returns.
Dynamic Reload
Settings-sourced schedules are hot-reloaded when the settings pipeline pushes updates. The scheduler:
- Compares the new schedule map against the running entries
- Stops removed or changed entries
- Starts new or changed entries
- Leaves unchanged entries running (no restart, no timer reset)
Static entries from peel.yaml are loaded once at startup and are not reloaded.
Examples
Periodic Highstate
Apply the full state tree every 4 hours with splay to avoid fleet-wide thundering herd:
schedule:
highstate:
module: state.highstate
cron: "0 */4 * * *"
splay: 15m
return_job: trueCleanup Script
Remove old temp files hourly, starting immediately on boot:
schedule:
cleanup_tmp:
module: cmd.run
args:
command: "find /tmp -mtime +7 -delete"
interval: 1h
splay: 5m
run_on_start: trueHealth Check
Run a lightweight health check every 5 minutes with job tracking for alerting:
schedule:
health_check:
module: cmd.run
args:
command: "/usr/local/bin/health-check.sh"
interval: 5m
return_job: true
maxrunning: 1Disk Monitoring (via Settings)
Push a disk check schedule to all peels through the settings pipeline:
schedule:
disk_check:
module: cmd.run
args:
command: "df -h | awk '$5+0 > 90 {print $0}'"
interval: 30m
return_job: trueDisabled Entry
Keep a schedule definition for reference without running it:
schedule:
expensive_audit:
module: cmd.run
args:
command: "/opt/audit/full-scan.sh"
cron: "0 2 * * 0"
enabled: falseTroubleshooting
Schedule Not Firing
- Check that the entry has
enabled: true(or omitted, which defaults totrue) - Verify that exactly one of
intervalorcronis set - Check peel logs for schedule registration messages at startup
Executions Piling Up
If a scheduled module takes longer than the interval, subsequent fires are skipped when maxrunning (default 1) is reached. Increase maxrunning only if concurrent execution is safe, or increase the interval.
Splay Seems Too Large
Splay adds a random delay in [0, splay) on every fire. If your interval is 30m and splay is 20m, executions will occur between 30 and 50 minutes apart. Keep splay small relative to the interval.
Settings Schedule Not Updating
- Verify the settings pipeline is delivering the
schedulekey (check withzester '<peel>' settings.get schedule) - Check peel logs for settings reload events
- Ensure the top file targets the correct peels