Monitoring and Status
Thilke agent operations use a layered status model:
- systemd service health for each local service;
- HTTP health endpoints for the daemon, Codex gateway, graph context, and Multica bridge;
- iii runtime checks for source-controlled config, telemetry opt-out, and absence of in-memory production warnings;
- durable run mirror inspection under
/var/lib/thilke-agent/runs; - optional smoke execution through the Multica-to-iii-to-agentd path.
Canonical status command
Run from the agent repository on the host:
cd /home/thilke/src/agent
scripts/thilke-agent-status.py --pretty
The script emits JSON and exits non-zero if any required service or endpoint is unhealthy.
Important fields:
ok: aggregate pass/fail.services: systemdis-activeresults.http: service health endpoint responses.iii.telemetry_disabled: verifies iii anonymous telemetry is disabled.iii.in_memory_warning_present: must befalsefor the production-like deployment.iii.config_file_active: verifies iii loaded/etc/iii/iii-config.yaml.storage.runs_count: count of durable run mirrors.latest_runs: recent run mirror metadata.
Smoke mode
Use smoke mode when validating a deploy or debugging the Multica event path:
scripts/thilke-agent-status.py --smoke --pretty
Smoke mode calls scripts/smoke/iii-multica-process-next.sh. It creates a durable plan-issue run through:
Multica bridge -> iii -> thilke-iii-worker -> thilke-agentd plan-issue
It remains write-safe. It performs planning only and does not mutate repositories.
Expected smoke marker:
III_MULTICA_PROCESS_NEXT_OK <run-id>
Direct service checks
systemctl is-active \
thilke-agentd \
thilke-codex-gateway \
thilke-pathway-context \
thilke-multica-bridge \
thilke-iii \
thilke-iii-worker
curl -fsS http://127.0.0.1:7817/healthz
curl -fsS http://127.0.0.1:8092/healthz
curl -fsS http://127.0.0.1:8090/healthz
curl -fsS http://127.0.0.1:8093/healthz
Log inspection
docker logs --tail 120 thilke-iii
journalctl -u thilke-iii-worker -n 120 --no-pager
journalctl -u thilke-agentd -n 120 --no-pager
journalctl -u thilke-multica-bridge -n 120 --no-pager
Next monitoring work
- Export the JSON status through a local authenticated endpoint.
- Add a scheduled status mirror into Multica for display-only dashboards.
- Add Prometheus counters for run duration, status, model tier, approval latency, smoke results, and path (
iiiversus direct fallback). - Add iii trace and metric retrieval to the status script once the deployed engine API shape is stable.