Files
yakpanel-core/architecture/2026/03-agent-protocol-v1.md

124 lines
3.7 KiB
Markdown
Raw Normal View History

2026-04-07 20:29:49 +05:30
# YakPanel Agent Protocol v1
## Goals
- Secure control channel between panel and managed servers.
- Reliable command execution with resumable progress.
- Capability-aware execution and strict auditability.
## Transport and Session
- Primary transport: bidirectional gRPC stream over mTLS.
- Fallback transport: WebSocket over mTLS.
- Agent uses outbound connection only.
- Server requires certificate pinning and protocol version negotiation.
## Version Negotiation
- Agent sends `protocol_versions` during hello.
- Gateway picks highest compatible version and responds with `selected_version`.
- If no compatible version exists, gateway returns `ERR_UNSUPPORTED_VERSION`.
## Message Envelope (common fields)
- `message_id` (uuid): unique per message.
- `timestamp` (RFC3339 UTC): replay protection.
- `tenant_id` (uuid): owning tenant.
- `server_id` (uuid): managed server.
- `agent_id` (string): persistent agent identity.
- `trace_id` (string): distributed trace correlation.
- `signature` (base64): detached signature over canonical payload.
## Enrollment Flow
1. Agent starts with one-time bootstrap token.
2. Agent calls `EnrollRequest` with host fingerprint and capabilities.
3. Gateway validates token and issues short-lived enrollment certificate.
4. Agent re-connects with cert; gateway persists `agent_id` and fingerprints.
5. Enrollment token is revoked after successful bind.
## Heartbeat Flow
- Interval: every 10 seconds.
- Payload includes:
- `agent_version`
- `capability_hash`
- `system_load` (cpu, mem, disk)
- `service_states` (nginx/apache/ols/mysql/redis/docker)
- Gateway updates presence and emits server status event.
- Missing 3 consecutive heartbeats marks agent as degraded; 6 marks offline.
## Command Flow
### CommandRequest
- `cmd_id` (uuid)
- `job_id` (uuid)
- `type` (enum): `SITE_CREATE`, `SITE_DELETE`, `SSL_ISSUE`, `SSL_APPLY`, `SERVICE_RELOAD`, `DOCKER_DEPLOY`, etc.
- `args` (json map)
- `timeout_sec` (int)
- `idempotency_key` (string)
- `required_capabilities` (string array)
### Agent behavior
- Validate signature and timestamp window.
- Validate required capability set.
- Return ACK immediately with acceptance or reject reason.
- Emit progress events at step boundaries.
- Emit final result with exit code, summaries, and artifact references.
### Retry and Exactly-once strategy
- Gateway retries only if no terminal result and timeout exceeded.
- Agent deduplicates by `idempotency_key` and returns last known terminal state for duplicates.
- Workflow layer (Laravel) provides exactly-once user-visible semantics.
## Event Types
- `AGENT_HELLO`
- `AGENT_HEARTBEAT`
- `COMMAND_ACK`
- `COMMAND_PROGRESS`
- `COMMAND_LOG`
- `COMMAND_RESULT`
- `COMMAND_ERROR`
- `AGENT_STATE_CHANGED`
## Error Model
- Error codes:
- `ERR_UNAUTHORIZED`
- `ERR_UNSUPPORTED_VERSION`
- `ERR_INVALID_SIGNATURE`
- `ERR_REPLAY_DETECTED`
- `ERR_CAPABILITY_MISSING`
- `ERR_INVALID_ARGS`
- `ERR_EXECUTION_FAILED`
- `ERR_TIMEOUT`
- Every error includes `code`, `message`, `retryable`, `details`.
## Security Controls
- mTLS with rotating short-lived certs.
- Nonce + timestamp replay protection.
- Detached command signatures from panel signing key.
- Agent command allowlist by capability.
- Immutable append-only event log in control plane.
## Minimal Protobuf Sketch
```proto
syntax = "proto3";
package yakpanel.agent.v1;
message Envelope {
string message_id = 1;
string timestamp = 2;
string tenant_id = 3;
string server_id = 4;
string agent_id = 5;
string trace_id = 6;
bytes signature = 7;
}
message CommandRequest {
Envelope envelope = 1;
string cmd_id = 2;
string job_id = 3;
string type = 4;
string args_json = 5;
int32 timeout_sec = 6;
string idempotency_key = 7;
repeated string required_capabilities = 8;
}
```