zester
Guides

Overview

Jobs are the execution unit in Zester. Every action dispatched from the master to one or more peels — whether applying states, running ad-hoc commands, or collecting data — is wrapped in a job. Jobs provide tracking, timeout handling, status aggregation, and persistent result storage.


How Jobs Work

  1. The CLI creates a Job with a unique KSUID-based identifier (JID), a function to execute, arguments, and a list of target peels.
  2. The CLI dispatches the job to the master via zester.dispatch (request/reply). In a multi-master deployment, NATS queue group zester.masters delivers the request to exactly one master.
  3. The receiving master sets the job's Owner field to its own instance ID, stores the job in the NATS KV jobs bucket, and publishes an ExecRequest to each target peel via zester.cmd.<peel-id>.
  4. Each peel executes the function and publishes its result (return) to zester.job.<jid>.return.<peel-id>.
  5. A Watcher goroutine on the owning master tracks returns. When all targets have returned — or the timeout expires — the watcher finalizes the job with an aggregated status.
  6. The final job state and returns are persisted in NATS KV for later retrieval.
  7. If the owning master fails, surviving masters detect the missing heartbeat and recover orphaned jobs by creating new watchers. See High Availability for details.

Section Contents


Key Concepts

ConceptDescription
JIDJob ID — a KSUID (K-Sorted Unique ID) that is time-ordered and globally unique
FunctionThe operation to execute (e.g., state.apply, cmd.run)
TargetsThe list of peel IDs that should execute the job
OwnerThe master instance ID that dispatched and is watching the job (for multi-master HA)
EpochKV revision from ownership CAS; serves as a fencing token to prevent duplicate execution
ReturnThe execution result from a single peel
AckOptional acknowledgment subject reserved by the protocol (currently not emitted by peel runtime)
WatcherA master-side goroutine that tracks job progress and finalizes status
StatusThe lifecycle state of a job: pending, claimed, running, complete, partial, timeout, failed, canceled

NATS Subjects

Jobs use a structured NATS subject hierarchy:

SubjectDirectionPurpose
zester.dispatchCLI -> MasterSubmit a job for dispatch (request/reply, queue group: zester.masters)
zester.cmd.<peel-id>Master -> PeelDeliver ExecRequest to a target peel
zester.job.<jid>.dispatchMaster -> JetStreamJob dispatched event (logged)
zester.job.<jid>.ack.<peel-id>Peel -> MasterPeel acknowledges an accepted dispatch (published after fencing/dedup, before execution)
zester.job.<jid>.return.<peel-id>Peel -> Master/CLIPeel publishes execution result
zester.job.<jid>.statusMaster -> JetStreamAggregated job status (finalization)
zester.job.<jid>.cancelCLI -> Master -> PeelsCancellation signal (stops peel execution)

KV Storage

Jobs and returns are persisted in NATS JetStream KV buckets:

BucketKey FormatTTLHistoryContent
jobs<jid>7 days10 revisionsFull job spec and current status
job-returns<jid>.<peel-id>7 days1 revisionIncremental per-peel returns (the sole store for return payloads)

The jobs bucket keeps 10 revisions of history per key, allowing you to trace how a job's status evolved over time (pending -> running -> complete). Both buckets have a 7-day TTL, after which entries are automatically purged.

On this page