Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Monitoring and Status

Thilke agent operations use a layered status model:

  1. systemd service health for each local service;
  2. HTTP health endpoints for the daemon, Codex gateway, graph context, and Multica bridge;
  3. iii runtime checks for source-controlled config, telemetry opt-out, and absence of in-memory production warnings;
  4. durable run mirror inspection under /var/lib/thilke-agent/runs;
  5. optional smoke execution through the Multica-to-iii-to-agentd path.

Canonical status command

Run from the agent repository on the host:

cd /home/thilke/src/agent
scripts/thilke-agent-status.py --pretty

The script emits JSON and exits non-zero if any required service or endpoint is unhealthy.

Important fields:

  • ok: aggregate pass/fail.
  • services: systemd is-active results.
  • http: service health endpoint responses.
  • iii.telemetry_disabled: verifies iii anonymous telemetry is disabled.
  • iii.in_memory_warning_present: must be false for the production-like deployment.
  • iii.config_file_active: verifies iii loaded /etc/iii/iii-config.yaml.
  • storage.runs_count: count of durable run mirrors.
  • latest_runs: recent run mirror metadata.

Smoke mode

Use smoke mode when validating a deploy or debugging the Multica event path:

scripts/thilke-agent-status.py --smoke --pretty

Smoke mode calls scripts/smoke/iii-multica-process-next.sh. It creates a durable plan-issue run through:

Multica bridge -> iii -> thilke-iii-worker -> thilke-agentd plan-issue

It remains write-safe. It performs planning only and does not mutate repositories.

Expected smoke marker:

III_MULTICA_PROCESS_NEXT_OK <run-id>

Direct service checks

systemctl is-active \
  thilke-agentd \
  thilke-codex-gateway \
  thilke-pathway-context \
  thilke-multica-bridge \
  thilke-iii \
  thilke-iii-worker
curl -fsS http://127.0.0.1:7817/healthz
curl -fsS http://127.0.0.1:8092/healthz
curl -fsS http://127.0.0.1:8090/healthz
curl -fsS http://127.0.0.1:8093/healthz

Log inspection

docker logs --tail 120 thilke-iii
journalctl -u thilke-iii-worker -n 120 --no-pager
journalctl -u thilke-agentd -n 120 --no-pager
journalctl -u thilke-multica-bridge -n 120 --no-pager

Next monitoring work

  • Export the JSON status through a local authenticated endpoint.
  • Add a scheduled status mirror into Multica for display-only dashboards.
  • Add Prometheus counters for run duration, status, model tier, approval latency, smoke results, and path (iii versus direct fallback).
  • Add iii trace and metric retrieval to the status script once the deployed engine API shape is stable.