Session Lifecycle¶

State Machine¶

CREATED --> STARTING --> PROVISIONING --> RUNNING --> STOPPING --> STOPPED --> ARCHIVED
                                            |                       |
                                          FAILED <------------------+
                                            |
                                          ARCHIVED

Restarts:
  STOPPED --> STARTING    (restart)
  FAILED  --> STARTING    (restart)

Restore:
  ARCHIVED --> STOPPED    (restore, then can restart)

Transition Rules¶

From	To	Trigger
`CREATED`	`STARTING`	`start_session()`
`STOPPED`	`STARTING`	`start_session()` (restart)
`FAILED`	`STARTING`	`start_session()` (restart)
`STARTING`	`PROVISIONING`	Pod manager returns `PodStartResult`
`STARTING`	`FAILED`	Contributor pipeline or pod manager raises
`PROVISIONING`	`RUNNING`	Background readiness poll succeeds
`PROVISIONING`	`FAILED`	Readiness timeout or backend error
`RUNNING`	`STOPPING`	`stop_session()`
`PROVISIONING`	`STOPPING`	`stop_session()` (cancel provisioning)
`STOPPING`	`STOPPED`	Pod manager confirms stop, contributors cleaned up
`STOPPING`	`FAILED`	Pod stop or cleanup raises
`STOPPED`	`ARCHIVED`	`archive_session()`
`FAILED`	`ARCHIVED`	`archive_session()`
`CREATED`	`ARCHIVED`	`archive_session()`
`ARCHIVED`	`STOPPED`	`restore_session()`

The can_start() method gates startability: only CREATED, STOPPED, and FAILED sessions can start. The can_stop() method gates stoppability: only RUNNING and PROVISIONING sessions can stop.

Creation Flow¶

API receives SessionCreate request. Required: name. Optional: model, source (git or local_mount), template_name, preset_id.
Resolve template defaults. If template_name is set and a TemplateProvider is configured, the template's first repo and model fill in any unset fields. Templates are config-driven (YAML/CRD), not DB-stored.
Validate repository. If the source is git with a non-empty repo URL, and repo validation is enabled, the git registry finds the matching GitProvider and calls validate_repo(). Raises RepoValidationError if the repo doesn't exist or isn't accessible.
Create session record. A Session object is created with status CREATED, a generated UUID, and the resolved source/model. If a Principal is present, owner_id and tenant_id are set from the JWT claims.
Persist and broadcast. The session is persisted via SessionRepository.create(). An SSE event is published to connected clients.

At this point the session exists but has no running infrastructure. The caller (API route) immediately calls start_session().

Start Flow (Contributor Pipeline)¶

The start flow is the core orchestration logic. It takes a persisted session and turns it into a running pod group.

Step 1: State Transition¶

Session transitions from its current status to STARTING. This is persisted and broadcast.

Step 2: Build SessionContext¶

A read-only SessionContext is assembled from the start request:

SessionContext(
    principal=principal,           # Authenticated identity (from JWT)
    template_name=template_name,   # Optional workspace template
    profile_name=profile_name,     # Optional forge profile (deprecated)
    terminal_restricted=...,       # Whether terminal is restricted
    credential_names=(...),        # Credentials to mount
    integration_ids=(...),         # Integration connections to enable
    integration_connections=(...), # Pre-fetched connection objects
    resource_config={...},         # CPU/memory/GPU config
)

If no integration IDs are specified, all enabled integrations for the user are auto-included.

Step 3: Run Contributors¶

Contributors run sequentially. Each receives the Session and SessionContext and returns a SessionContribution:

@dataclass(frozen=True)
class SessionContribution:
    values: dict[str, Any]          # Helm chart values
    pod_spec: PodSpecAdditions | None  # Pod spec fragments

The contributor order matters. Later contributors can depend on values set by earlier ones. The configured order:

CoreSessionContributor -- Session identity (ID, name, model), ingress host, terminal restriction flag. No port dependency.
TemplateContributor -- Resolves the workspace template. Adds repos, setup scripts, workspace layout, model, system prompt, MCP servers, env vars, and workload config from the template.
GitContributor -- Resolves the git clone URL using the GitProvider registry. Adds authenticated clone URL, branch, and git credentials to the spec.
IntegrationContributor -- Iterates enabled integration connections. For MCP-type integrations, adds MCP server configs with credential-mapped environment variables. For non-MCP integrations, adds env vars directly.
StorageContributor -- Provisions the workspace PVC (per-session) and home PVC (per-user) via StoragePort. Adds volume and mount specs. Respects admin setting for home directory enablement.
GatewayContributor -- Fetches gateway config from GatewayPort. Adds gateway name, namespace, and JWT/auth config for Skuld's HTTPRoute template.
ResourceContributor -- Translates user-friendly resource config (e.g., {"cpu": "4", "memory": "8Gi", "gpu": "1"}) to K8s-native primitives via ResourceProvider. Adds requests, limits, node selectors, tolerations, and runtime class.
IsolationContributor -- Adds namespace, security context, and network policy for tenant isolation.
SecretInjectionContributor -- Calls SecretInjectionPort.pod_spec_additions() to get CSI driver volumes, mounts, labels, and annotations. Volundr never sees secret values here.
SecretsContributor -- Adds K8s secret references as environment variables from env_secret_refs config.

Additionally, a LocalMountContributor is auto-wired for local development scenarios. It adds host path mounts with configurable allowed prefixes and root mount restrictions.

Step 4: Merge Contributions¶

All SessionContribution objects are deep-merged into a single SessionSpec:

values dicts are recursively merged (later values override earlier ones for the same key).
PodSpecAdditions are concatenated: volumes, volume_mounts, and env are appended; labels and annotations are merged; service_account uses the last non-None value.

Step 5: Start Pods¶

PodManager.start(session, spec) submits the merged spec to the backend:

Flux -- creates a HelmRelease for the Skuld chart
Direct K8s -- applies pod manifests directly via the Kubernetes API
Docker -- runs containers via Docker (local development)

Returns a PodStartResult with chat_endpoint, code_endpoint, and pod_name.

Step 6: Transition to PROVISIONING¶

Session is updated with endpoints, pod name, and status PROVISIONING. This is persisted and broadcast.

Step 7: Background Readiness Poll¶

An asyncio.Task polls for readiness:

Initial delay (default 5 seconds, configurable via provisioning.initial_delay_seconds).
Calls PodManager.wait_for_ready(session, timeout) which blocks until the backend reports ready or the timeout expires (default 300 seconds, configurable via provisioning.timeout_seconds).
Re-fetches the session to verify it's still in PROVISIONING (could have been stopped/deleted concurrently).
On success: transitions to RUNNING.
On timeout or error: transitions to FAILED with error detail.

On application restart, reconcile_provisioning_sessions() re-launches polling for any sessions stuck in PROVISIONING.

Pod Deployment¶

A running session pod group contains:

Container	Image	Purpose
Skuld broker	`volundr/skuld`	WebSocket bridge between browser and AI CLI
code-server	`linuxserver/code-server`	VS Code in the browser
ttyd terminal	`tsl0922/ttyd`	Web terminal
Envoy sidecar (optional)	`envoyproxy/envoy`	JWT validation for incoming traffic

All containers share:

Workspace PVC mounted at /volundr/sessions/<session_id>/workspace -- per-session, created by StorageContributor
Home PVC mounted at /volundr/home/<user_id> -- per-user, persistent across sessions

The Skuld container runs the Claude Code CLI (or Codex CLI) as a child process.

Stop Flow¶

Transition to STOPPING. Persisted and broadcast.
Cancel provisioning task. If a readiness poll is running, it's cancelled.
Stop pods. PodManager.stop() tells the backend to tear down the pod group. Logged but non-blocking if the pods are already gone.
Run contributor cleanup in reverse order. Each contributor's cleanup() method is called. Failures are logged but don't block other contributors. StorageContributor.cleanup() archives the workspace PVC (soft delete, PVC is relabeled not destroyed).
Transition to STOPPED. Endpoints are cleared. Persisted and broadcast.
Auto-create chronicle (triggered by Skuld's shutdown report or by the API on session stop).

Chronicle Creation¶

Chronicles provide session continuity for reforge workflows.

From Broker Report¶

When Skuld shuts down, it sends a POST to the Volundr API with:

{
  "session_id": "...",
  "summary": "Implemented the user authentication flow...",
  "key_changes": ["Added JWT validation middleware", "Created login page"],
  "unfinished_work": "Error handling for expired tokens",
  "duration_seconds": 3600
}

ChronicleService.create_or_update_from_broker() is idempotent:

If a DRAFT chronicle already exists for this session, it's enriched with the broker data.
Otherwise, a new chronicle is created from the session's current state (project, repo, branch, model, config snapshot, token usage) and enriched with the broker data.

Chronicle Data¶

Chronicle(
    session_id=...,
    project="volundr",          # Derived from repo URL
    repo="github.com/org/repo",
    branch="feature/auth",
    model="claude-sonnet-4-20250514",
    config_snapshot={...},      # Session config at creation time
    summary="...",              # From Skuld broker report
    key_changes=[...],          # From Skuld broker report
    unfinished_work="...",      # From Skuld broker report
    token_usage=45000,
    cost=Decimal("0.45"),
    duration_seconds=3600,
    parent_chronicle_id=...,    # Set during reforge
)

Timeline¶

Timeline events are collected throughout the session lifetime and stored via TimelineRepository:

Session events -- start, stop
Message events -- user/assistant messages with token counts
File events -- created, modified, deleted with insertions/deletions
Git events -- commits with hashes
Terminal events -- commands with exit codes
Error events

The timeline is aggregated into:

FileSummary -- deduplicated file changes (new/modified/deleted, total insertions/deletions)
CommitSummary -- commit hash, message, wall clock time
token_burn -- token usage bucketed into 5-minute intervals

Reforge¶

Reforge creates a new session from an existing chronicle, preserving the development context chain.

Fetch the chronicle by ID.
Extract config_snapshot (name, model, repo, branch).
Create a new session with name = "{original_name} (reforged)" and the same git source config.
The new session's chronicle will have parent_chronicle_id set, linking it to the parent chronicle.

The get_chain() method walks parent_chronicle_id links to build the full reforge chain from oldest ancestor to current.

Archive and Restore¶

Archive: Stops the session if running, then sets status to ARCHIVED with a timestamp. Archived sessions are excluded from default list queries.

Restore: Moves an ARCHIVED session back to STOPPED. From there it can be restarted normally.

Bulk archive: archive_stopped_sessions() archives all sessions in STOPPED status.

Delete¶

Deletion is aggressive but safe:

Cancel any active provisioning task.
If running or provisioning, attempt to stop pods. Pod stop failures are logged but don't prevent deletion.
Run contributor cleanup in reverse order.
Delete the session record.
Broadcast deletion event.