This guide covers day-to-day operations for managing peel enrollment in Zester: setting up the enrollment system, approving and rejecting requests, monitoring enrollment state, revoking enrolled peels, and troubleshooting common issues.
For architectural details, see docs/enrollment-architecture.md. For the HTTP API reference, see docs/enrollment-api.md.
Master is running with the enrollment HTTP server enabled in configuration.
TLS certificates are in place for the enrollment HTTP API (TLS 1.3 is enforced).
Account key bundle is available on all master instances (required to sign user JWTs for approved peels).
Admin access is configured for either:
REST API tokens (api.tokens) for HTTPS admin operations, or
CLI with NATS credentials — admin state transitions (approve, reject, revoke) are sent via request/reply to a running master; the operator's credentials must allow publishing on zester.admin.enroll.*. The --direct-kv break-glass path additionally needs write access to the enrollments bucket.
NATS JetStream is running -- the enrollment system uses two KV buckets (enrollments and enroll-challenges) that are auto-created on master startup via bus.InitializeStorage().
Only tls:// NATS URLs are accepted (bus.ValidateTLSNATSURLs). If the NATS server certificate is not signed by a publicly trusted CA, point nats_ca (or --nats-ca) at the CA certificate; when unset, the CA is resolved from the NATS_CA_FILE environment variable, then /data/auth/nats-ca.crt if present, then the system trust store (bus.NATSClientTLS).
Field
Type
Default
Description
enroll.addr
string
":8443"
Address and port for the enrollment HTTP listener
enroll.tls_cert
string
/data/auth/enroll.crt
Path to TLS server certificate
enroll.tls_key
string
/data/auth/enroll.key
Path to TLS server private key
The account seed for signing user JWTs is loaded from <auth_dir>/account.seed (default auth dir: /data/auth).
TLS is mandatory
The enrollment server cannot start without a valid TLS certificate and key. NewServer() rejects startup if either is missing (see pkg/enroll/server.go). TLS 1.3 minimum is enforced. Default certificate paths are /data/auth/enroll.crt and /data/auth/enroll.key.
If you do not have a PKI infrastructure, generate certificates using your standard tooling (for example OpenSSL, Vault PKI, or cert-manager) and place them at the configured paths:
This produces:
/etc/zester/tls/ ca.crt # CA certificate (distribute to peels) server.crt # Server certificate server.key # Server private key (mode 0600)
Production certificates
Use your organization's PKI (e.g., HashiCorp Vault, cert-manager) for production. The built-in CA is intended for bootstrap and testing.
Ordered list of master enrollment API URLs (HTTPS), tried in order with automatic failover. Takes precedence over master_url when non-empty. On the CLI: --master-urls, comma-separated; --master-urls "" clears a YAML-configured list
master_url
string
(optional)
Single master enrollment API URL (HTTPS); shorthand for a one-entry master_urls list. At least one of master_url / master_urls is required for enrollment
enroll_ca
string
(optional)
CA certificate to verify the master's enrollment TLS certificate
nats_url
string
tls://nats:4222
NATS server URL (used after enrollment; only tls:// accepted)
nats_ca
string
(optional)
CA certificate for the NATS server TLS certificate (falls back to NATS_CA_FILE, then /data/auth/nats-ca.crt, then the system trust store)
For multi-master deployments, master_urls is the recommended pattern: list every master's enrollment API URL and the peel handles failover itself. The enrollment client sends each request to its current URL and rotates to the next one in the list on connection-level failures (dial errors, timeouts, or 5xx responses); a successful response pins the current URL, so the client never rotates needlessly. 4xx responses never trigger rotation -- they indicate a request problem, not a master problem.
Rotation applies to every enrollment operation: nonce request, enrollment submission, status polling/SSE, and credential download. Because enrollment records and challenge nonces are shared across masters via NATS KV, a flow that starts against one master completes cleanly against another -- for example, a peel that submitted its enrollment to master-1 can download credentials from master-2 if master-1 goes down while waiting for approval.
The auth directory is fixed at /data/auth on the peel (authDir constant in cmd/zester-peel/main.go); it holds the .creds and .seed files. The peel automatically detects whether it needs enrollment (see pkg/enroll/persist.go):
If <auth_dir>/<peel-id>.creds exists (HasCredentials()), skip enrollment and connect to NATS directly.
If no .creds file exists but master_urls (or master_url) is set, start the enrollment flow:
a. Load or generate an nkey seed from <auth_dir>/<peel-id>.seed (LoadOrGenerateKey()).
b. Run the enrollment client (Client.Enroll()).
c. Write credentials to <auth_dir>/<peel-id>.creds (mode 0600).
If neither .creds nor a master URL is available, exit with an error.
The simplest way to enroll a peel is to start it without a .creds file:
# On the new peel machine:zester-peel --id web-03 --nats-url tls://nats:4222 --nats-ca /data/auth/nats-ca.crt \ --master-urls https://master-1.example.com:8443,https://master-2.example.com:8443
With a single master, --master-url https://master.example.com:8443 works the same way. The --nats-ca flag is only needed when the NATS server certificate is not covered by the default CA resolution (NATS_CA_FILE env var, /data/auth/nats-ca.crt, or the system trust store).
The peel will execute the full enrollment flow automatically (implemented in Client.Enroll() in pkg/enroll/client.go):
Load or generate a new Ed25519 nkey seed (saved to <auth_dir>/web-03.seed).
Derive the X25519 curve public key for settings encryption.
Request a challenge nonce from the master (GET /api/v1/enroll/nonce).
Sign challenge_bytes || curve_public_key with the Ed25519 private key.
Submit the enrollment request with the signature (POST /api/v1/enroll).
Poll for approval with exponential backoff (10s base, 5m max).
Upon approval, sign the enrollment ID and download credentials (GET /api/v1/enroll/{id}/creds).
Decode the base64 JWT, write .creds file, connect to NATS.
The peel saves its nkey seed to <auth_dir>/<peel-id>.seed on first key generation. If the peel restarts before enrollment completes (e.g., approval is still pending), it reloads the existing seed rather than generating a new key, ensuring the key presented to the master remains consistent.
After enrollment is complete and the .creds file is written, the .seed file can optionally be removed -- the .creds file contains everything needed for NATS authentication. However, keeping the .seed is required for re-enrollment scenarios.
The enroll command group connects to NATS using the operator's credentials from the CLI configuration (/etc/zester/master.yaml or ~/.zester/config.yaml). Each subcommand opens a short-lived NATS connection, performs its operation, and disconnects.
The transport differs by operation class:
State transitions (approve, reject, revoke) send a MessagePack enroll.AdminRequest ({id, operator, reason}) over request/reply on zester.admin.enroll.approve / .reject / .revoke (5-second timeout). The masters answer on the zester-masters-admin NATS queue group — exactly one master applies each request to the enrollments KV bucket and replies with an enroll.AdminResponse (the updated record, or an error). The answering master Info-logs every admin operation with action, enrollment ID, operator, and peel ID for the audit trail.
Read-only operations (list, show) read the enrollments KV bucket directly.
If no master is running, state transitions fail with no master is answering enrollment admin requests — pass --direct-kv (a persistent flag on the enroll command group) to write the enrollment KV directly with the CLI's own credentials, as a break-glass path. See cmd/zester/cmd/enroll.go (runEnrollAdmin / runEnrollAdminKV) and the master-side handler in internal/masterd/admin.go.
No additional flags are needed beyond the standard CLI configuration (--config for config file path).
Reject and revoke use the same bearer-token authentication as approve and accept an optional JSON body carrying a reason, recorded on the enrollment record:
Responses: 200 with the updated enrollment, 400 on invalid JSON, 401 on a missing/invalid token, 404 for an unknown enrollment, 409 on an invalid state transition. The authenticated API token's username is recorded as the decider.
By default, zester enroll list shows pending enrollments. Use the --state flag to filter by other states.
# List pending enrollments (default)zester enroll list# List all enrollments regardless of statezester enroll list --state all# List only active enrollmentszester enroll list --state active
ID PEEL ID HOSTNAME STATE CREATEDenr-2JFK0003ABCD1234567890 web-03 web-03.prod.internal pending 2026-02-10 15:00:12enr-2JFK0006EFGH9876543210 app-01 app-01.prod.internal pending 2026-02-10 15:02:33
The list reads enrollment records directly from the enrollments KV bucket via Store.List() and displays EnrollmentSummary records (defined in pkg/enroll/enrollment.go) with ID, peel ID, hostname, state, and creation time. See cmd/zester/cmd/enroll_list.go.
See cmd/zester/cmd/enroll_show.go. The show command reads the full enrollment record directly from the enrollments KV bucket via Store.Get(), displaying all fields including public key, hostname, metadata, and timestamps.
The CLI automatically uses the current OS username (via os/user.Current()) as the operator identity for the audit trail — it is carried in AdminRequest.Operator and recorded on the enrollment record as DecidedBy. See cmd/zester/cmd/enroll_approve.go.
The request is answered by a running master (queue group zester-masters-admin), which transitions the enrollment record from pending to approved via CAS update on the NATS KV bucket (see Store.Approve() in pkg/enroll/store.go) and Info-logs the operation. The peel (if polling) will detect the approval and proceed to download credentials. With --direct-kv, the CLI applies the same CAS transition itself.
Note: Credentials are not generated at approval time. They are generated on-demand when the peel calls GET /api/v1/enroll/{id}/creds with a valid Nkey signature. This ensures the credential download is authenticated.
The --reason flag is optional but recommended for audit purposes; it is recorded on the enrollment record as RejectReason. Like approve, the transition is applied by a running master via request/reply, with the current OS username recorded as the decider. See cmd/zester/cmd/enroll_reject.go.
The peel will receive a rejected state when it next polls. No credentials are issued. A rejected peel can re-enroll (creating a new enrollment record with a new ID).
Revoking an enrollment invalidates a peel's NATS credentials, effectively removing it from the fleet. Only enrollments in approved, issued, or active state can be revoked.
The --reason flag is optional but recommended; it is recorded on the enrollment record. Like approve and reject, the transition is applied by a running master via request/reply, with the current OS username recorded as the decider. See cmd/zester/cmd/enroll_revoke.go.
The enrollment record state transitions to revoked via CAS update (Store.Revoke() in pkg/enroll/store.go), applied by the master answering the admin request.
The peel's user JWT is added to the NATS account revocation list.
The peel's existing NATS connection will be closed when NATS detects the revoked JWT.
The peel cannot reconnect with the old credentials.
A revoked peel must generate new credentials and submit a new enrollment request:
# On the revoked peel, delete old credentials and restartrm /data/auth/web-03.credsrm /data/auth/web-03.seed # force new key generationsystemctl restart zester-peel
The peel will detect the missing .creds file and start a new enrollment flow with a fresh nkey.
Note: Prometheus metrics for the enrollment system are not yet implemented. Monitor enrollment activity via structured log output and the zester enroll list CLI command.
Enrollment events are logged with structured fields by the HTTP handler (pkg/enroll/handler.go), the store (pkg/enroll/store.go), and the master's admin service (internal/masterd/admin.go). Key log events:
Every admin state transition arriving over NATS request/reply is Info-logged by the answering master as enrollment admin operation with the action, enrollment ID, operator, and peel ID — this is the audit line for CLI-initiated approve/reject/revoke.
Note: The remoteIP() helper in handler.go extracts the client IP directly from the TCP connection's RemoteAddr, deliberately ignoring X-Forwarded-For to prevent IP spoofing that could bypass per-IP rate limiting.
You can inspect the enrollment buckets directly using NATS tooling:
# List all enrollment records (includes peel index entries)nats kv ls enrollments# Get a specific enrollment record (raw MessagePack)nats kv get enrollments enr-2JFKABCD1234# Look up an enrollment by peel ID via the peel indexnats kv get enrollments peel.web-03# List active challenge nonces (5-minute TTL, auto-expire)nats kv ls enroll-challenges# Watch for enrollment state changes in real-timenats kv watch enrollments
The enrollments bucket uses a dual-key scheme:
Primary key: <enrollment-id> (e.g., enr-2JFK...) stores the full Record in MessagePack.
Index key: peel.<peel-id> (e.g., peel.web-03) maps a peel ID to its enrollment ID for O(1) lookup.
The HTTPS listener uses per-IP token bucket rate limiting with stale entry eviction (implemented as middleware in pkg/enroll/handler.go, applied per route class in pkg/enroll/server.go). When the bucket map exceeds 5000 entries, entries older than 5 minutes are evicted to prevent unbounded memory growth.
There are two separate budgets:
Route class
Bucket capacity
Refill rate
/api/v1/enroll and subpaths (nonce, enroll, status, stream, creds)
10 tokens
1 token per 10 seconds
All other routes on the listener (REST API, including /api/v1/enrollments* and docs)
120 tokens
20 tokens per second
Each request costs 1 token. The strict budget protects the unauthenticated enrollment endpoints; the REST API routes get a far higher budget so dispatch-and-poll clients are not locked out. Note that /api/v1/enrollments (the REST admin route) is not an enrollment path — only /api/v1/enroll and its subpaths use the strict limiter (see isEnrollPath() in pkg/enroll/server.go).
When a rate limit is exceeded, the server returns 429 Too Many Requests with a Retry-After header. The peel client handles this by applying exponential backoff.
CLI admin operations (approve, reject, revoke, list) are not subject to HTTP rate limiting because they are performed over NATS (request/reply to a master, or direct KV reads), not the HTTP API.
Ensure the peel's enroll_ca points to the correct CA certificate. If using system CAs, leave enroll_ca empty (the client falls back to the system CA pool; see pkg/enroll/client.go).
In multi-master deployments, verify the peel lists all masters in master_urls. The client rotates to the next URL automatically on connection failures and 5xx responses (logged as enroll: rotating to next master URL), so a single unreachable master should not block enrollment -- if it does, only one URL is configured.
Symptom: Peel logs show challenge verification failed during enrollment.
Checks:
Challenge nonces expire after 5 minutes (ChallengeTTL in pkg/enroll/challenge.go). If the peel takes too long to sign and submit, the challenge expires.
Challenges are single-use (enforced via CAS in ChallengeStore.Consume()). If the peel retries the same challenge ID, it will be rejected.
Check for clock skew between peel and master. While challenge expiry is checked on the master side, significant clock differences can cause confusing log messages.
The peel client self-heals: when the master rejects a submission with 401 challenge verification failed (unknown, expired, or already-consumed challenge), the client transparently requests a fresh nonce and resubmits, bounded to 3 total attempts (logged as challenge unknown or expired, re-requesting nonce). This also covers challenges wiped by a NATS restart -- the enroll-challenges bucket is memory-backed -- so no operator action is needed for transient challenge failures. Only a persistently rejecting server surfaces an error.
Symptom: Enrollment shows approved but peel logs show authentication failed when downloading credentials.
Checks:
The credential download requires an Authorization: Nkey <pub>:<sig> header where the peel signs the enrollment ID with its nkey. If the peel's seed changed between enrollment submission and credential download, the signature will be invalid.
Check if the .seed file was accidentally deleted or regenerated:
ls -la /data/auth/<peel-id>.seed
Credentials are single-use. Once downloaded, the enrollment transitions to issued and subsequent download attempts return 403 Forbidden (the enrollment is no longer in approved state; see handleCreds in pkg/enroll/handler.go). If the peel crashed after downloading but before writing the .creds file, the credentials are lost and the peel must be re-enrolled:
zester enroll revoke <enrollment-id># Then delete seed on peel and restart
Symptom: Enrollment shows issued but peel fails to connect to NATS with authentication errors.
Checks:
Verify the .creds file was written successfully:
ls -la /data/auth/<peel-id>.creds
Validate the JWT in the .creds file:
nats server check connection --creds /data/auth/<peel-id>.creds
Check that the NATS server's account JWT resolver has the correct account JWT.
Ensure the master's account seed matches the NATS server's trusted account. The CredentialIssuer uses the account key bundle to sign user JWTs (see pkg/enroll/credential.go).
Check if the JWT has expired. Default expiry is 6 months (DefaultJWTExpiry in pkg/enroll/credential.go).
Symptom: Peel logs show rate limit exceeded or HTTP 429 responses.
Checks:
The enrollment-endpoint rate limiter allows a burst of 10 requests per IP with a refill of 1 token per 10 seconds. If many peels share a NAT gateway, they will exhaust the shared bucket quickly. (REST API routes on the same listener have a separate, much higher budget: 120 burst, 20 requests per second.)
Check the enrollment.ratelimit.exceeded log entries on the master for the affected source IP.
The rate limiter is currently per-master-instance (in-memory). Requests load-balanced across multiple masters each have their own rate limit bucket.
Symptom: Enrollment request rejected with peel already has an active enrollment.
Checks:
Each peel ID can have at most one active enrollment (in pending, approved, issued, or active state). The handler checks this via Store.FindByPeelID() (see handleEnroll in pkg/enroll/handler.go).
If the peel was previously enrolled in one of these states and needs re-enrollment, the old enrollment must first be revoked (or rejected, if still pending):
zester enroll show <enrollment-id> # check current statezester enroll revoke <enrollment-id> # or reject if pending
Re-enrollment after rejected or revoked state automatically creates a new record.
Symptom:zester enroll approve|reject|revoke fails with no master is answering enrollment admin requests; retry with --direct-kv.
Checks:
Admin state transitions require a running master — they are request/reply operations answered by the masters' zester-masters-admin queue group. Verify at least one master is up and connected to NATS (look for enrollment admin service started in the master log).
If no master can be started (e.g., bootstrap or recovery scenarios), use the break-glass path:
zester enroll approve <enrollment-id> --direct-kv
This writes the enrollment KV directly with the CLI's own credentials, which must have write access to the enrollments bucket.
Symptom: Enrollment records disappear after master restart.
This should not happen because enrollment state is stored in NATS JetStream KV, not in master memory. If it does:
Verify NATS JetStream data directory is intact.
Check that the NATS server was not restarted with JetStream storage wiped.
Inspect the KV buckets:
nats kv ls enrollmentsnats kv ls enroll-challenges
Challenge nonces are ephemeral (5-minute TTL, memory storage in the enroll-challenges bucket) -- their loss after a NATS restart is expected. Peels mid-enrollment receive 401 challenge verification failed on their next submission and transparently re-request a fresh nonce; no operator action is needed.
# Approve all pending enrollments (use with caution)zester enroll list --state pending | tail -n +2 | awk '{print $1}' | while read id; do zester enroll approve "$id"done
Verify before bulk approval
The scripted loop above approves every pending enrollment. Use this only when you are confident all pending requests are legitimate (e.g., in a controlled provisioning pipeline where peels are launched by trusted automation).
If master logs show enrollment.credential.delivery.contention:
This means two concurrent requests tried to download credentials for the same enrollment. The CAS-first pattern in handleCreds (see pkg/enroll/handler.go) ensures only one succeeds -- the enrollment transitions to issued BEFORE the JWT is returned. The second request fails with 409 Conflict.
This is a safety mechanism, not a bug. It prevents credential duplication. The peel client should retry the entire enrollment flow if it receives a 409 on credential download.