# YakPanel Agent Protocol v1 ## Goals - Secure control channel between panel and managed servers. - Reliable command execution with resumable progress. - Capability-aware execution and strict auditability. ## Transport and Session - Primary transport: bidirectional gRPC stream over mTLS. - Fallback transport: WebSocket over mTLS. - Agent uses outbound connection only. - Server requires certificate pinning and protocol version negotiation. ## Version Negotiation - Agent sends `protocol_versions` during hello. - Gateway picks highest compatible version and responds with `selected_version`. - If no compatible version exists, gateway returns `ERR_UNSUPPORTED_VERSION`. ## Message Envelope (common fields) - `message_id` (uuid): unique per message. - `timestamp` (RFC3339 UTC): replay protection. - `tenant_id` (uuid): owning tenant. - `server_id` (uuid): managed server. - `agent_id` (string): persistent agent identity. - `trace_id` (string): distributed trace correlation. - `signature` (base64): detached signature over canonical payload. ## Enrollment Flow 1. Agent starts with one-time bootstrap token. 2. Agent calls `EnrollRequest` with host fingerprint and capabilities. 3. Gateway validates token and issues short-lived enrollment certificate. 4. Agent re-connects with cert; gateway persists `agent_id` and fingerprints. 5. Enrollment token is revoked after successful bind. ## Heartbeat Flow - Interval: every 10 seconds. - Payload includes: - `agent_version` - `capability_hash` - `system_load` (cpu, mem, disk) - `service_states` (nginx/apache/ols/mysql/redis/docker) - Gateway updates presence and emits server status event. - Missing 3 consecutive heartbeats marks agent as degraded; 6 marks offline. ## Command Flow ### CommandRequest - `cmd_id` (uuid) - `job_id` (uuid) - `type` (enum): `SITE_CREATE`, `SITE_DELETE`, `SSL_ISSUE`, `SSL_APPLY`, `SERVICE_RELOAD`, `DOCKER_DEPLOY`, etc. - `args` (json map) - `timeout_sec` (int) - `idempotency_key` (string) - `required_capabilities` (string array) ### Agent behavior - Validate signature and timestamp window. - Validate required capability set. - Return ACK immediately with acceptance or reject reason. - Emit progress events at step boundaries. - Emit final result with exit code, summaries, and artifact references. ### Retry and Exactly-once strategy - Gateway retries only if no terminal result and timeout exceeded. - Agent deduplicates by `idempotency_key` and returns last known terminal state for duplicates. - Workflow layer (Laravel) provides exactly-once user-visible semantics. ## Event Types - `AGENT_HELLO` - `AGENT_HEARTBEAT` - `COMMAND_ACK` - `COMMAND_PROGRESS` - `COMMAND_LOG` - `COMMAND_RESULT` - `COMMAND_ERROR` - `AGENT_STATE_CHANGED` ## Error Model - Error codes: - `ERR_UNAUTHORIZED` - `ERR_UNSUPPORTED_VERSION` - `ERR_INVALID_SIGNATURE` - `ERR_REPLAY_DETECTED` - `ERR_CAPABILITY_MISSING` - `ERR_INVALID_ARGS` - `ERR_EXECUTION_FAILED` - `ERR_TIMEOUT` - Every error includes `code`, `message`, `retryable`, `details`. ## Security Controls - mTLS with rotating short-lived certs. - Nonce + timestamp replay protection. - Detached command signatures from panel signing key. - Agent command allowlist by capability. - Immutable append-only event log in control plane. ## Minimal Protobuf Sketch ```proto syntax = "proto3"; package yakpanel.agent.v1; message Envelope { string message_id = 1; string timestamp = 2; string tenant_id = 3; string server_id = 4; string agent_id = 5; string trace_id = 6; bytes signature = 7; } message CommandRequest { Envelope envelope = 1; string cmd_id = 2; string job_id = 3; string type = 4; string args_json = 5; int32 timeout_sec = 6; string idempotency_key = 7; repeated string required_capabilities = 8; } ```