Stream-JSON schema contract
clauditor invokes the Claude CLI with --output-format stream-json --verbose
and parses its NDJSON output line-by-line. This document is the authoritative
reference for which fields clauditor reads, which it tolerates missing, and
how it handles malformed lines. The parser lives in
src/clauditor/_harnesses/_claude_code.py::ClaudeCodeHarness.invoke
(reached from src/clauditor/runner.py::SkillRunner._invoke via the
Harness protocol introduced in #148).
This schema reflects the Anthropic CLI streaming format as verified live
against claude 2.1.x. If a future CLI version changes the shape, update
this document and .claude/rules/stream-json-schema.md in the same commit.
See also: Codex stream-JSON schema — sibling
reference for the OpenAI Codex CLI's NDJSON format consumed by
CodexHarness (#149).
Transport
- Each line of
claude's stdout is one JSON object (NDJSON / JSON Lines). - The parser reads lines until EOF, then calls
proc.wait()to collect the exit code. - Blank lines are silently skipped.
- Lines that fail
json.loadsare logged to stderr (prefixclauditor.runner: skipping malformed stream-json line:) and skipped — they never abort the run. - Values that parse as JSON but are not objects (scalars, arrays) are defensively ignored.
Message shapes clauditor consumes
Every message is a JSON object with a top-level type field. clauditor
dispatches on type and only reads the fields documented below. Any
unknown type is stored in raw_messages / stream_events for debugging
but otherwise ignored.
type: "system"
Init / hook / misc events from the CLI.
{"type":"system","subtype":"init","session_id":"abc123","cwd":"/tmp/work","apiKeySource":"ANTHROPIC_API_KEY"}
Read by clauditor: the type: "system" / subtype: "init" message
is parsed for its apiKeySource field (when present).
subtype(string) — tolerated-if-missing. Only"init"messages triggerapiKeySourceextraction; other subtypes (e.g."hook") are appended toraw_messages/stream_eventsbut otherwise ignored.apiKeySource(string) — tolerated-if-missing / non-string (SkillResult.api_key_sourcestaysNonein that case). When a string is present on the FIRSTsystem/initmessage, clauditor stores the value onSkillResult.api_key_sourceand emits one stderr info line of the formclauditor.runner: apiKeySource=<value>. Example values the Claude CLI emits today:"ANTHROPIC_API_KEY","claude.ai","none". The value is a label (identifying which auth path was used), not a secret — it does not contain the API key itself, so printing it to stderr and persisting it onSkillResultis safe. Older CLI builds may omit this field; in that caseapi_key_sourcestaysNoneand no stderr line is emitted (absence is the signal, per DEC-012 ofplans/super/64-runner-auth-timeout.md). Subsequentsystem/initmessages are ignored — first init wins, per DEC-015. When the caller threads asubjectthroughcall_anthropic(grader call sites — L2 extraction, L3 grading, L3 blind compare side1/side2, triggers judge, suggest proposer, propose-eval) the stderr line gains a(<subject>)suffix so operators can attribute multi-subprocess runs (e.g.grade --transport cli) to specific internal LLM calls (issue #107).
All system/* messages (every subtype) are appended to
raw_messages and stream_events for downstream tooling
(transcripts, debug dumps) but do not affect SkillResult.output
or token counts.
type: "assistant"
Assistant turn with one or more content blocks.
{"type":"assistant","message":{"role":"assistant","content":[{"type":"text","text":"Here are 5 venues near you..."},{"type":"tool_use","id":"toolu_01","name":"WebSearch","input":{"query":"parks"}}]}}
Required fields for text capture:
- message (object) — tolerated-if-missing (treated as {}).
- message.content (list) — tolerated-if-missing / non-list
(message is skipped for text capture).
- Each block in message.content must be a dict with type == "text" to
contribute to SkillResult.output. The block's text field is read
with .get("text", ""), so a text block missing its text field
contributes an empty string rather than raising.
Tolerated block types: tool_use, tool_result, thinking, and any
other block whose type is not "text" are skipped for the purposes of
SkillResult.output. They are still present in raw_messages for
downstream tooling (transcripts, debug dumps).
Text chunks from every assistant message are joined with \n to form
SkillResult.output.
tool_use block — Task(run_in_background=true) detection. A
tool_use block with name == "Task" and
input.run_in_background is True (strict is True check, not a
truthy-match) is counted as a background-task launch for the
non-completion heuristic described below. Other Task calls
(foreground, or run_in_background absent / falsy) are not counted.
See _count_background_task_launches in
src/clauditor/runner.py.
type: "result"
The terminal line of a run. Carries aggregate token usage and, on failure, a user-facing error string.
{"type":"result","subtype":"success","is_error":false,"usage":{"input_tokens":1423,"output_tokens":512}}
Required fields: none are hard-required — every field is read
defensively.
- usage (object) — tolerated-if-missing (token counts stay at 0).
- usage.input_tokens (int) — tolerated-if-missing / None / non-numeric
(falls back to 0 via a try/except (TypeError, ValueError) wrapper).
- usage.output_tokens (int) — same defensive treatment.
- is_error (bool) — tolerated-if-missing (treated as False). When
strictly True (Python is True check — the string "true", int
1, and other truthy non-bool values do NOT trigger the error
branch), clauditor classifies the result as a failure and surfaces
a user-facing error string via SkillResult.error /
SkillResult.error_category. This strict check preserves back-
compat with older CLI builds that may omit the field on success.
- result (string) — present on result messages only. When
is_error: true, this is the human-readable error text, often the
verbatim Anthropic API error including status codes (e.g. "API
Error: Request rejected (429) · Rate limit exceeded for your
organization"). Absent on success. Clauditor classifies the text
by keyword (case-insensitive — the payload is lowercased before
matching, so "Rate Limit" and "rate limit" classify identically
and the ANTHROPIC_API_KEY hint matches regardless of casing):
"429" / "rate limit" / "rate-limit" →
error_category = "rate_limit"; "401" / "403" /
"unauthorized" / "authentication" / "auth error" /
"ANTHROPIC_API_KEY" → error_category = "auth"; otherwise
error_category = "api". The rate-limit match runs before the
auth match so a string containing both is classified as
rate_limit. Strings longer than 4 KB are truncated in
SkillResult.error with a " ... (truncated)" suffix; the full
string remains in stream_events for forensics.
Seeing a result message flips an internal saw_result flag. If the
stream ends without any result message, clauditor emits:
clauditor.runner: stream-json ended without a 'result' message; token usage unavailable
to stderr and still returns a SkillResult with input_tokens = 0 and
output_tokens = 0. Missing token data is a warning, not a fatal error.
Failure example — 429 rate limit. When the underlying API returns
a 429 rate-limit, Claude CLI emits a terminal result message with
is_error: true and the user-facing text in result. Clauditor
surfaces this through SkillResult.error + SkillResult.error_category
== "rate_limit".
{"type":"system","subtype":"init","session_id":"abc123","cwd":"/tmp/work"}
{"type":"assistant","message":{"id":"msg_01","role":"assistant","content":[],"stop_reason":null}}
{"type":"result","subtype":"error_max_turns","is_error":true,"result":"API Error: Request rejected (429). Your organization has exceeded the rate limit.","usage":{"input_tokens":1423,"output_tokens":0}}
Heuristic classifications (advisory)
Two detectors run after the stream ends and a result message was
seen. Both are gated on allow_hang_heuristic=True (the default;
per-skill opt-out via EvalSpec.allow_hang_heuristic) AND on the
run not already carrying a stream-json is_error: true
classification. Each detector appends a prefixed warning to
SkillResult.warnings and sets SkillResult.error_category
without setting SkillResult.error — output and exit_code stay
as reported by the CLI. Both prefixes are load-bearing: they
down-classify succeeded_cleanly to False.
Interactive-hang (error_category = "interactive", warning prefix
interactive-hang:): a single-turn run whose final assistant message
has stop_reason == "end_turn" and either ends with ? or contains
an AskUserQuestion tool_use block. Catches skills that asked the
user for input and stalled. See _detect_interactive_hang.
Background-task non-completion (error_category = "background-task",
warning prefix background-task:): runs only when interactive-hang
did NOT fire. Triggers when (a) at least one assistant tool_use
block has name == "Task" and input.run_in_background is True,
AND (b) either the concatenated final text matches the regex
\b(waiting on|still waiting|continuing|in progress|in the background)\b
(case-insensitive) OR the result message's num_turns is less
than launches + 2. claude -p does not poll background tasks,
so a skill that launches them and exits terminates with a valid
result message but truncated output; this detector catches that
silent failure. See _detect_background_task_noncompletion
(GitHub #97).
Precedence is strict: stream-json is_error: true wins over both
heuristics; interactive-hang wins over background-task. A run that
matches both heuristics is classified as interactive-hang only
(background-task does not also fire).
Error handling summary
| Condition | Behavior |
|---|---|
json.JSONDecodeError on a line |
Log to stderr, skip the line, keep reading |
| Line parses as non-dict JSON | Silently skip |
assistant message without message.content list |
Skip text capture for that message |
Text block missing text field |
Contributes empty string |
result message with missing/broken usage |
Token counts default to 0 |
result message with is_error absent |
Treat as success (back-compat with older CLI versions) |
result message with non-bool is_error (e.g. "true", 1) |
Treat as absent — strict is True check only |
result message with is_error: true and no result string |
SkillResult.error = "API error (no detail)", error_category = "api" |
system/init message with non-string apiKeySource |
Field ignored; SkillResult.api_key_source stays None; no stderr line emitted |
Multiple system/init messages with apiKeySource |
First wins; later init messages are ignored (DEC-015) |
result message with is_error: true and result > 4 KB |
Truncate at 4 KB with " ... (truncated)" suffix on SkillResult.error; classify from the prefix; full string retained in stream_events |
No result message before EOF |
Warn to stderr, return SkillResult with zero tokens |
| Subprocess times out | Kill child, return SkillResult(exit_code=-1, error="timeout") with whatever text was captured so far |
claude binary not found |
Return SkillResult(exit_code=-1, error="Claude CLI not found: …") |
Every exit path is wrapped in try/finally so that
SkillResult.duration_seconds is populated for success, timeout, missing
binary, and any other error path.
Canonical parser
src/clauditor/_harnesses/_claude_code.py::ClaudeCodeHarness.invoke is the
single source of truth for all parsing logic (called from
src/clauditor/runner.py::SkillRunner._invoke). If you need to extend the
schema (new message type, new field), update that method and this document
and .claude/rules/stream-json-schema.md in the same commit.