zester

NATS JetStream Deep Dive

NATS JetStream is the unified messaging, persistence, and KV layer that underpins every Zester operation. This document covers the rationale for choosing NATS, the subject hierarchy, KV bucket design, JetStream streams, and deployment topologies from single-node to multi-region superclusters.

Why NATS

Zester evaluated several messaging technologies before selecting NATS JetStream as its backbone. The decision was driven by the need for a single technology that could serve as both the real-time message bus and the persistent data store, eliminating the operational complexity of running separate messaging and database systems.

Alternatives Considered

CriteriaNATS JetStreamZeroMQRedis (Streams)Custom Protocol
PersistenceNative (JetStream KV + streams)None (pure messaging)Redis Streams + persistenceMust build from scratch
Clustering / HABuilt-in RAFT consensusManual (broker patterns)Redis Cluster or SentinelMust build from scratch
Auth modelnkey + JWT with subject-level ACLsNone built-inACL per commandMust build from scratch
DeploymentStandalone server or clusterC library with bindingsSeparate processN/A
Pub/Sub patternsNative with wildcardsExtensive (PUB/SUB, PUSH/PULL, etc.)PUB/SUB on channelsMust build from scratch
KV storeNative JetStream KVNoneNative (core feature)Must build from scratch
Connection modelTCP with TLS, auto-reconnectVarious socket typesTCP with optional TLSMust build from scratch
LanguageGo (native)C (with Go bindings)C (with Go bindings)Go
Operational overheadLow (standalone server)Low (no broker, but no persistence)Medium (separate process, replication)Low (custom)

Why Not ZeroMQ

ZeroMQ (used by SaltStack) is a powerful messaging library but lacks persistence, clustering, and authentication. Salt compensates by layering its own PKI, job cache (returners), and failover logic on top. Zester eliminates this complexity by using NATS, which provides all of these features natively.

Why Not Redis

Redis Streams provide persistence and pub/sub, but running a separate Redis cluster adds operational overhead and introduces a second failure domain. Redis ACLs operate at the command level, not the subject/topic level, making fine-grained per-peel authorization harder to implement.

Why NATS Wins

  1. Single dependency -- NATS serves as message bus, KV store, and event stream. No external database needed.
  2. Standalone server -- NATS runs as its own process with dedicated resources, simplifying operations and enabling independent scaling.
  3. Native Go -- Written in Go, with first-class Go client libraries. No CGo, no FFI.
  4. Subject-level authorization -- JWT permissions control exactly which subjects each peel can publish to or subscribe from.
  5. Built-in clustering -- RAFT-based consensus for high availability without external coordination (no ZooKeeper, no etcd).
  6. Leaf nodes -- Edge deployment pattern where remote sites connect through a leaf node, enabling WAN-optimized topologies.

Subject Hierarchy

Zester organizes all NATS communication under a structured subject hierarchy rooted at zester.. Every subject follows a consistent pattern that maps to the operation type and target identity.

Source of truth

The definitive subject definitions are in pkg/bus/subjects.go. The table below is derived directly from that source.

Subject Prefix Tree

Subject Reference Table

Subject PatternDirectionPurposeExample
zester.cmd.<peel-id>Master -> PeelDirect command dispatchzester.cmd.web-01
zester.cmd.*Master (subscribe)Wildcard for all commandsMonitoring
zester.event.<peel-id>.send.<tag...>Peel -> events streamCustom peel events (event.send module); tag tokens are dotted (myco.deploy.finished = tag myco/deploy/finished)zester.event.web-01.send.myco.deploy.finished
zester.event.<peel-id>.beacon.<name>Peel -> events streamBeacon events (v1: the service beacon)zester.event.web-01.beacon.service
zester.event._master.<tag...>Master -> events streamMaster-synthesized events (enrollment pending, reactor-derived reaction.<tag> chains)zester.event._master.enroll.pending.enr-1
zester.event._admin.send.<tag...>CLI -> events streamOperator events (zester event send)zester.event._admin.send.maintenance.start
zester.event.>events stream / admin (subscribe)Captured durably by the events stream feeding the reactor; zester event watch subscribes liveEvent aggregation
zester.fact.<peel-id>Peel -> MasterFact data publicationzester.fact.web-01
zester.basket.<peel-id>.<function>Peel -> PeelCross-peel data sharingzester.basket.web-01.network.ip_addrs
zester.dispatchCLI -> MasterJob dispatch request (request/reply, queue group: zester.masters)zester.dispatch
zester.target.resolveCLI/Peel -> MasterTarget expression resolution (request/reply, queue group: zester-target-resolvers)zester.target.resolve
zester.admin.enroll.approveCLI -> MasterEnrollment approval (request/reply, queue group: zester-masters-admin)zester.admin.enroll.approve
zester.admin.enroll.rejectCLI -> MasterEnrollment rejection (request/reply, queue group: zester-masters-admin)zester.admin.enroll.reject
zester.admin.enroll.revokeCLI -> MasterEnrollment revocation (request/reply, queue group: zester-masters-admin)zester.admin.enroll.revoke
zester.job.<jid>.dispatchMaster -> JetStreamJob dispatched event (audit)zester.job.abc123.dispatch
zester.job.<jid>.ack.<peel-id>Peel -> MasterDispatch acknowledgment: published when a peel accepts a job (before execution). Acked peels are exempt from the master's ack-window silent-target re-dispatchzester.job.abc123.ack.web-01
zester.job.<jid>.return.<peel-id>Peel -> Master/CLIJob execution resultzester.job.abc123.return.web-01
zester.job.<jid>.statusMaster -> JetStreamAggregated job statuszester.job.abc123.status
zester.job.<jid>.cancelCLI -> Master -> PeelsJob cancellation (stops peel execution)zester.job.abc123.cancel
zester.job.<jid>.schedule.<peel-id>Peel -> JetStreamScheduler return_job result (peel identity enforced by NATS permissions)zester.job.abc123.schedule.web-01
zester.job.<jid>.>Any (subscribe)All events for a jobJob monitoring
zester.reactor.testCLI -> MasterReactor rule dry-run (request/reply, queue group: zester-reactor-testers)zester reactor test '_master/enroll/pending/*'
zester.update.cmd.<id>Master -> WatchdogNode-local update commands (request/reply)zester.update.cmd.web-01
zester.update.rollout.startCLI -> MasterStart rollout request (queue group: zester.masters)zester.update.rollout.start
zester.update.rollout.abortCLI -> MasterAbort rollout request (queue group: zester.masters)zester.update.rollout.abort

Queue Groups

Zester uses NATS queue groups to distribute job dispatch across multiple master instances in an HA deployment. When masters subscribe to zester.dispatch with queue group zester.masters, NATS delivers each incoming request to exactly one member of the group using round-robin delivery.

nc.QueueSubscribe("zester.dispatch", "zester.masters", handler)
Queue GroupSubjectPurpose
zester.masterszester.dispatchDistribute job dispatch across active masters
zester.masterszester.update.rollout.startEnsure a single master instance handles each rollout start request
zester.masterszester.update.rollout.abortEnsure a single master instance handles each rollout abort request
zester-target-resolverszester.target.resolveEvery master joins; exactly one answers each target-resolution request from the CLI or a peel's basket() query. Callers fall back to a facts-KV scan when no master responds
zester-masters-adminzester.admin.enroll.approve / .reject / .revokeExactly one master applies each enrollment admin request (enroll.AdminRequest -> enroll.AdminResponse)
zester-reactor-testerszester.reactor.testEvery reactor-enabled master joins; exactly one answers each zester reactor test dry-run against its live rule snapshot

Queue groups provide automatic load balancing and failover with no external coordination. When a master disconnects, NATS removes it from the group and redistributes to surviving members. When a new master starts, it automatically joins by subscribing with the same group name.

See also

For full details on multi-master dispatch and queue group behavior, see the High Availability documentation.

Wildcard Patterns

NATS supports two wildcard tokens that Zester uses extensively:

TokenMeaningExample
*Matches exactly one subject tokenzester.cmd.* matches zester.cmd.web-01 but not zester.cmd.web-01.restart
>Matches one or more tokens at the tailzester.event.> matches zester.event.web-01 and zester.event.web-01.beacon.disk

Helper Functions

The pkg/bus/subjects.go file provides type-safe helper functions for constructing subjects, preventing string formatting errors:

FunctionReturnsExample Output
CmdSubject("web-01")stringzester.cmd.web-01
EventSubject("web-01")stringzester.event.web-01
BeaconSubject("web-01", "disk")stringzester.event.web-01.beacon.disk
FactSubject("web-01")stringzester.fact.web-01
BasketSubject("web-01", "network.ip_addrs")stringzester.basket.web-01.network.ip_addrs
JobSubject("abc123")stringzester.job.abc123
JobDispatchSubject("abc123")stringzester.job.abc123.dispatch
JobAckSubject("abc123", "web-01")stringzester.job.abc123.ack.web-01
JobReturnSubject("abc123", "web-01")stringzester.job.abc123.return.web-01
JobStatusSubject("abc123")stringzester.job.abc123.status
JobCancelSubject("abc123")stringzester.job.abc123.cancel
JobScheduleSubject("abc123", "web-01")stringzester.job.abc123.schedule.web-01
JobScheduleWildcard()stringzester.job.*.schedule.*
JobAllSubject("abc123")stringzester.job.abc123.>
PeelEventSendSubject("web-01", "myco.deploy.finished")stringzester.event.web-01.send.myco.deploy.finished
MasterEventSubject("enroll.pending.enr-1")stringzester.event._master.enroll.pending.enr-1
AdminEventSendSubject("myco.deploy.finished")stringzester.event._admin.send.myco.deploy.finished
EventSubjectAll()stringzester.event.>
UpdateCmdSubject("web-01")stringzester.update.cmd.web-01
UpdateCmdSubjectAll()stringzester.update.cmd.*

Subject Permissions (JWT)

Peel user JWTs (auth.PeelUserJWTOptions in pkg/auth/jwt.go) are least-privilege: a peel can only publish on its own peel-scoped subjects (zester.event.<peel-id>.>, zester.fact.<peel-id>, zester.job.*.ack.<peel-id>, zester.job.*.return.<peel-id>, zester.job.*.schedule.<peel-id>) plus the JetStream API subjects for the KV buckets it needs. Three grants support the peel-side services:

GrantPurpose
zester.target.resolve (publish)Request/reply to the master-side target-resolution service used by basket() queries; replies ride the _INBOX.> grant
$JS.API.STREAM.INFO.KV_peel-heartbeat (publish)Bucket handle for the peel-heartbeat KV bucket
$KV.peel-heartbeat.<peel-id> (publish)Put the peel's own liveness beat (scoped to its own key, mirroring the facts grant)

Missing grants degrade, never break

A peel whose credentials lack these grants stays fully functional: heartbeat puts fail (logged at Debug and retried on every 10s tick — the peel simply looks offline in presence views) and basket target resolution falls back to facts-KV scans. Degraded, not broken.

The admin subjects (zester.admin.enroll.*) are served by the masters with full-permission master credentials; operator NATS credentials must be allowed to publish on them for zester enroll approve|reject|revoke to work. Peel JWTs deliberately have no grant for zester.admin.enroll.*, and no $KV.jobs / $KV.job-returns grants — a compromised peel cannot administer enrollments or tamper with other peels' job records.

KV Buckets

Zester uses NATS JetStream KV buckets as its primary persistence layer. All buckets are created during master startup via InitializeStorageOpts() in pkg/bus/kv.go (the InitializeStorage() convenience wrapper delegates to it).

Bucket Reference

BucketKey PatternDescriptionTTLHistoryTier
facts<peel-id>System facts collected from peels (OS, CPU, memory, disk, network)None5critical
settings-files<relative-path>, _manifest, _revisionSanitized .zy settings files with __ZESTER_SECRET:*__ placeholders, plus the file-set manifest and revision signalNone3critical
secrets<peel-id> and _master_curve_pubPer-peel encrypted secret map plus master curve public keyNone3critical
basket<peel-id>.<function>Data shared between peels (e.g., IP addresses, hostnames)None1critical
jobs<jid> and active.<jid>Job specifications, status, owner master ID, and epoch (fencing token), plus the active-jobs index (ActiveJobEntry {owner, updated})7 days10critical
job-returns<jid>.<peel-id>Per-peel incremental execution results7 days1critical
master-heartbeat<master-instance-id>Master health heartbeats (timestamp, active jobs)15 seconds1ephemeral
peel-heartbeat<peel-id>Peel liveness heartbeats (facts.Heartbeat {ts, version, protocol}, written every 10s)30 seconds1ephemeral
leasespublisher, facts-secretsAdvisory leader leases for single-publisher work (bus.LeaderLease, 5s renew)15 seconds1ephemeral
enrollments<enrollment-id> or peel.<peel-id>Peel enrollment records and stateNone10critical
enroll-challengeschl-<ksuid>Short-lived enrollment challenge nonces (memory storage)5 minutes1ephemeral
state-files<relative-path>, _manifest, _revisionRaw .zy / .star state files for peel-side caching, plus the file-set manifest and revision signalNone3critical
update-manifests<component>.<goos>.<goarch>.<version>Published update manifestsNone5critical
update-status<component>.<id>Watchdog status heartbeats60 seconds1ephemeral
update-rollouts<rollout-id>Rollout state and progressNone10critical
reactor-filesreactor/<relative-path>, _manifest, _revisionReactor rule files (top.zy + reaction files), master-only — peel JWTs carry no grants for itNone3critical

Replica Tiering

Replica counts scale automatically with the NATS cluster: bus.EffectiveReplicas(explicit, clusterSize) returns an explicit operator-set count when > 0, otherwise min(3, max(1, clusterSize)) — capped at 3, the JetStream RAFT sweet spot (R5 buys little for KV workloads). InitializeStorageOpts(ctx, js, StorageOptions{Replicas, ClusterSize}) applies this to every bucket, stream, and Object Store.

Every asset — critical and ephemeral alike — gets the effective replica count; heartbeat and lease buckets must survive a single node loss for liveness detection to keep working. The DurabilityTier (TierCritical / TierEphemeral on BucketConfig.Tier) controls only warning behavior:

  • A critical asset whose effective replica count is below min(3, ClusterSize) (an operator explicitly forced a low count in a cluster) logs a loud warning per asset — a single-node disk failure then permanently loses that data.
  • A failed replica upgrade on an existing asset (older servers, insufficient cluster resources) never aborts startup: the asset is re-created at 1 replica with a warning that includes the manual migration hint (nats stream edit KV_<bucket> --replicas=N).

Bucket Lifecycle

KV Operations

All KV operations use MessagePack serialization transparently:

OperationFunctionDescription
WriteKVPut(ctx, kv, key, v)Encode value with MessagePack, store in bucket
ReadKVGet(ctx, kv, key, v)Fetch from bucket, decode MessagePack into struct
Create bucketCreateBucket(ctx, js, cfg)Create or return existing KV bucket
Delete bucketDeleteBucket(ctx, js, name)Remove a KV bucket

The bus.KV Interface Boundary

Domain packages never touch nats.go KV types directly. The bus.JetStreamAPI interface's KV methods return bus.KV — a bus-owned narrow interface (Get/Put/Create/Update/Delete/Keys/ListKeys/ListKeysFiltered/Watch/WatchAll/WatchFiltered) with bus-owned companion types: KVEntry (Key/Value/Revision/Operation), KeyWatcher (nil-sentinel end-of-replay; the channel closes on consumer loss), KeyLister, and the op constants KVOpPut/KVOpDelete/KVOpPurge.

At the production boundary, Client.JetStream() returns *bus.JS, an adapter over the raw jetstream.JetStream that implements JetStreamAPI, ConsumerAPI, and ObjectStoreAPI (Unwrap() exposes the raw context; bus.NewJS wraps a self-built one). The raw jetstream.KeyValue is adapted exactly once, in pkg/bus/kv_jetstream.go. The error sentinels bus.ErrKeyNotFound, bus.ErrKeyExists, and bus.ErrNoKeysFound are aliases of the jetstream sentinels, so errors.Is works with either form. Tests use bustest.NewFakeJS() / bustest.NewFakeKV() against the same interfaces.

Object Store Buckets

Zester currently defines one default Object Store bucket:

Object StoreKey PatternDescriptionTTLReplicas
update-binaries<component>/<goos>/<goarch>/<version>Binary artifacts used by self-update rollouts30 days1

JetStream Streams

In addition to KV buckets (which are themselves built on streams), Zester defines explicit JetStream streams for event logging.

job-events Stream

PropertyValue
Namejob-events
Subjectszester.job.>
Max age7 days
RetentionLimits policy
StorageFile
ReplicasEffectiveReplicas (scales with cluster size, capped at 3)
DescriptionFull job event log for replay and audit

The job-events stream captures messages published under zester.job.> (for example dispatch, ack, return, status, cancel, and scheduler schedule results). This provides:

  • Audit trail -- Complete history of who ran what, when, and what happened.
  • Replay capability -- Consumers can replay the stream from any point to reconstruct job state. The master relies on this in two places: the shared durable schedule-results consumer persists scheduler return_job results, and job reclaim replays zester.job.<jid>.return.> via an ephemeral pull consumer so returns published while no watcher was running are recovered.
  • Debugging -- Inspect the exact sequence of events for any job.

events Stream

PropertyValue
Nameevents
Subjectszester.event.>
Max age7 days
Max bytes / msgs1 GiB / 1,000,000
Duplicate (MsgID) window2 minutes
RetentionLimits policy
StorageFile
ReplicasEffectiveReplicas (scales with cluster size, capped at 3)
DescriptionReactor event log

The events stream captures every publish under zester.event.> — peel event.send events, beacons, operator events, and master-synthesized events — durably. The masters' shared durable consumer reactor (DeliverNew, explicit ack, AckWait 60s, MaxDeliver 5) delivers each event to exactly one master fleet-wide and replays events published while all masters were down. The size caps bound flood damage from a compromised peel; the duplicate window collapses redelivered reactor-derived events published with deterministic message IDs. See Reactor architecture.

Deployment Topologies

Zester supports multiple deployment topologies, scaling from development to multi-region production.

Single Node (Development / Small Scale)

The simplest deployment: one NATS server with the master connecting as a client.

Characteristics:

  • NATS runs as a separate process (or container) alongside the master
  • No high availability -- single NATS failure = total outage
  • Peels continue running last applied state during outage
  • Suitable for development, testing, and small deployments (< 100 peels)
  • JetStream data persisted to local disk (survives process restarts)

3-Node NATS Cluster (Production)

For production high availability, deploy a 3-node NATS cluster with RAFT consensus.

Characteristics:

  • Tolerates 1 node failure (quorum = 2 of 3)
  • Committed writes are never lost (RAFT guarantee)
  • KV bucket and stream replicas = 3
  • Election on leader failure is sub-second
  • Master connects to cluster as an external client
  • Peels receive connection URLs for all 3 nodes; NATS client handles failover

Quorum requirements

With 3 nodes, losing 2 nodes means the cluster cannot accept writes. JetStream becomes read-only for committed data. Plan for monitoring and rapid node recovery.

5-Node NATS Cluster (High Availability)

For environments requiring tolerance of 2 simultaneous failures:

  • Quorum = 3 of 5; tolerates 2 failures
  • Higher write latency (must replicate to 3 nodes)
  • Recommended for critical infrastructure with SLA requirements

Superclusters (Multi-Region)

For multi-region deployments, NATS superclusters use gateways to connect independent clusters:

Characteristics:

  • Each region has an independent RAFT cluster
  • Gateways route interest-based messages between regions
  • JetStream domains isolate per-region data (JetStreamDomain in ServerConfig)
  • Cross-region peels connect through their local cluster
  • Provides geographic redundancy and data locality

Leaf Nodes (Edge / WAN)

For edge deployments or sites behind restrictive firewalls, NATS leaf nodes provide a WAN-optimized connection:

Characteristics:

  • Leaf nodes multiplex all local peel connections into a single WAN connection to the hub cluster
  • Reduces WAN bandwidth and connection count
  • Peels connect to the local leaf node over LAN (low latency)
  • Leaf nodes can operate in a degraded mode during WAN outages (local messaging continues)
  • Subject mapping can remap subjects at the leaf boundary for namespace isolation

Network Partition Behavior

Understanding NATS RAFT behavior during network partitions is critical for production operations.

Partition Scenarios (3-Node Cluster)

One follower dies: Leader continues. Quorum maintained (2/3). Writes succeed. Dead node catches up on rejoin via log replay.

Leader dies: Remaining 2 nodes detect missing heartbeats. New leader elected (sub-second). Writes pause during election. No committed data lost.

Network split (1 vs 2): Minority partition (old leader) cannot commit writes. Majority partition elects new leader. Old leader steps down when it discovers higher term. Uncommitted entries on old leader are discarded.

Guarantee

Committed writes are never lost. A write is committed only after replication to a majority. During a partition, the minority becomes read-only for committed data and cannot accept new writes.

Peel Behavior During Partitions

ScenarioPeel Impact
Master node down, NATS cluster healthyPeels remain connected to NATS. New commands wait for master recovery.
NATS minority partition, peel on minority sidePeel reconnects to majority partition nodes. Buffered messages replayed.
Complete NATS outagePeel reconnects with exponential backoff (unlimited retries). Continues running last applied state.
WAN link down (leaf node)Local leaf messaging continues. WAN-dependent operations queue until reconnection.

On this page