LLM draft pipeline — operations guide

Operational reference for users of signalforge.draft and signalforge.llm. Companion to docs/safety-ops.md, docs/manifest-loader-ops.md, and docs/warehouse-adapter-ops.md, and to the design record in plans/super/5-llm-draft-pipeline.md.

Overview

Prerequisite — the model must declare its columns. Column-level tests are drafted from model.columns, which dbt populates from your schema .yml files. A model with no schema yml yields zero columns and the drafter can only produce model-level variants. See docs/manifest-loader-ops.md § Column metadata is the prerequisite for column-level tests for how to generate schema files (and dbt docs generate for types).

The draft pipeline turns one dbt model into one CandidateSchema — the typed value the prune layer (#6) consumes — by issuing one LLM call. It sits after the safety layer (which produces the LLMRequest) and before prune / grade / diff render (#6 / #7 / #8). Two subpackages share the work (DEC-001):

signalforge.llm — the centralized, provider-neutral LLM seam. One function, call_llm, owns the retry loop, backoff math, prompt-cache pre-send checks, and the LLMResult value object; a pluggable LLMProvider strategy (resolved from a process-level registry) owns the vendor-specific request build, response extraction, and exception classification (issue #135). The default provider is anthropic; its SDK noise (type-stub gaps, lazy exception-class import) stays confined to signalforge.llm._anthropic_client. No other module imports the anthropic SDK.
signalforge.draft — the orchestration layer on top of that seam. Owns the prompt builder, the JSON + anchor-contract parser, the fail-closed response-audit JSONL writer, and the draft_schema / draft_from_request entry points.

The seam split keeps the SDK noise (type-stub gaps, retry plumbing, exception-class lazy-import) confined to one subpackage; the rest of the layer stays SDK-agnostic and pyright-clean.

Public API surface

`signalforge.draft.all`

Name	Kind	Description
`draft_schema`	function	`draft_schema(model, adapter, policy, manifest, *, config) -> DraftOutcome`. End-to-end entry point: builds the safety-layer `LLMRequest`, renders the prompt, calls the LLM, parses, audits.
`draft_from_request`	function	`draft_from_request(request, model, manifest, *, config, audit_path) -> DraftOutcome`. Same pipeline minus the safety-layer step; takes a pre-built `LLMRequest`.
`DraftOutcome`	model	Frozen Pydantic value object: `(candidate, request, result)`. The thing every downstream stage receives.
`CandidateSchema`	model	The parsed schema the LLM produced: `(name, description, columns, tests, …)`. Frozen, `extra="ignore"` for read-back tolerance.
`CandidateColumn`	model	One column on a `CandidateSchema`: `(name, description, rationale, tests, meta)`.
`CandidateTest`	type	Discriminated union over `not_null` / `unique` / `accepted_values` / `relationships` / `custom_sql` test variants. Discriminator: `type`.
`CandidateTestCustomSQL`	model	The fifth variant (issue #116): a custom singular SQL business-rule test. Carries `sql` (a failing-rows SELECT), optional `column`, optional `rationale`. See § Custom business-rule tests.
`DraftConfig`	model	Config-shaped (`extra="forbid"`) Pydantic model mirroring the `llm:` block of `signalforge.yml`.
`load_draft_config`	function	`load_draft_config(project_dir, path=None) -> DraftConfig`. Mirrors `load_safety_config`.
`LLMResponseEvent`	model	One JSONL audit record per LLM response. Fields are documented in §4.
`DraftError`	exception	Base class for every failure surface in this layer.
`LLMOutputError`	exception	Base for parse-time failures (JSON / validation / anchor contract). Carries the bad-JSON envelope.

`signalforge.llm.all`

Name	Kind	Description
`call_llm`	function	The single provider-neutral LLM seam. Owns retry policy + cache pre-send check; selects an `LLMProvider` strategy by name (default `"anthropic"`). Returns `LLMResult`.
`LLMResult`	model	Frozen result shape: `text_blocks`, `response_text`, token counts (input/output/cache_creation/cache_read), `model`, `prompt_version`, `raw_message`.
`LLMError`	exception	Base class for everything in `signalforge.llm.errors`.
`LLMHelperError`	exception	Umbrella for SDK-call failures. Subclasses cover the retry-taxonomy branches.
`LLMAuthError`	exception	401 / 403 from the Anthropic API. No retry.
`LLMRateLimitError`	exception	429 retry budget exhausted. Carries `attempts`.
`LLMServerError`	exception	5xx retry budget exhausted.
`LLMConnectionError`	exception	Connection / transport retry budget exhausted.
`LLMCacheTooLargeError`	exception	Pre-send: cached block exceeds the SignalForge cap (8000 input tokens).

LLMResponseFormatError is exported from signalforge.llm.errors but not from the top-level __all__; reach for it via from signalforge.llm.errors import LLMResponseFormatError.

`DraftOutcome` shape

class DraftOutcome(BaseModel):
    candidate: CandidateSchema
    request: LLMRequest
    result: LLMResult

Three fields, each load-bearing for a different downstream stage:

candidate — the parsed CandidateSchema ready for the prune step (#6). The prune layer iterates candidate.columns[*].tests and candidate.tests to decide what to run against the warehouse.
request — the safety-layer LLMRequest that was sent to the LLM. The audit log ties the request to a durable receipt; keeping the typed object on the outcome lets prune cross-check columns_sent / redactions without re-running the safety layer.
result — the typed LLMResult with token usage, prompt_version, and raw_message. The grader (#7) reads result.prompt_version for incident-response queries; the diff renderer (#8) uses result.cache_creation_input_tokens / result.cache_read_input_tokens to surface cache economics in the per-run summary.

DraftOutcome is frozen=True, extra="ignore": downstream stages hold an outcome without worrying about post-construction mutation, and forward-compat field additions don't break older readers.

Response audit

Consumer guide. For cross-stage joins, jq / pandas worked examples, the forward-compat policy, and the redaction surface, see docs/audits.md. This section is the draft-layer production contract.

Path: audit_path.with_name("llm_responses.jsonl") — adjacent to the safety layer's audit.jsonl. Both audit streams share a parent directory so the privacy boundary is uniform (DEC-006).

One JSONL record per successful LLM call. The writer mirrors signalforge.safety.audit exactly: serialise → size-check (BEFORE any file open) → mkdir -p parent at 0o700 → os.open with O_APPEND | O_CREAT | 0o600 → single os.write → os.fsync → close.

LLMResponseEvent fields:

Field	Type	Meaning
`timestamp`	ISO 8601 datetime	UTC timestamp of the response.
`model_unique_id`	string	dbt unique_id of the drafted model.
`prompt_version`	16 hex chars	Deterministic blake2b digest of the prompt template content. See §9.
`response_text_hash`	16 hex chars	blake2b digest of the LLM's raw response text. Reviewers correlate to a captured response by re-hashing the cleartext.
`parsed_schema_hash`	16 hex chars	blake2b digest of the canonicalised parsed `CandidateSchema` (sorted keys via `json.dumps`).
`sent_sql_hash`	16 hex chars	blake2b digest of the model SQL placed in the `<MODEL_SQL>` envelope. Detects prompt drift between runs.
`cache_creation_input_tokens`	integer	Tokens charged at 1.25× input pricing for cache writes.
`cache_read_input_tokens`	integer	Tokens charged at 0.1× input pricing for cache reads.
`input_tokens`	integer	Total input tokens billed.
`output_tokens`	integer	Total output tokens billed.
`model`	string	The Anthropic model id used (e.g. `claude-sonnet-4-6`).
`signalforge_version`	PEP-440 version	The package version that produced the record. Read from `signalforge.__version__` at write time.
`audit_schema_version`	integer	Audit shape version. Currently `2`. Bumped `1 → 2` by #184 to carry the new `parser_reshaped` field. Stays typed `int` (not `Literal`) so v1 records still round-trip. v0.2 readers gate on this.
`parser_reshaped`	array of objects	One `ReshapeRecord` per parser re-attach (#184); empty in the no-reshape happy path (the v1-compatible default). Each record: `{original_column, target_scope: "model", test_type, reason}`. See § Parser re-attach for mis-scoped model-only variants.

Storing hashes (not cleartext) keeps individual records under the POSIX-atomic-append cap (_RESPONSE_AUDIT_RECORD_LIMIT_BYTES = 4000) and avoids re-emitting whatever PII the LLM may have echoed back from the prompt.

Fail-closed semantics (DEC-011). write_response_event catches no exceptions internally; oversize records propagate as LLMResponseAuditRecordTooLargeError (raised BEFORE any file is opened, so an oversize record leaves no on-disk artefact), and every other I/O / encoding failure propagates and is wrapped by draft_from_request as LLMResponseAuditWriteError. The drafter returns None on the floor — an unaudited LLM response is, by definition, output leaving without a receipt, exactly the failure mode this layer exists to prevent. The propagation IS the defence.

Incident response. sent_sql_hash and parsed_schema_hash (DEC-008, DEC-006) make "which run produced this column description?" answerable without spelunking through the LLM provider's logs:

# What runs hit a particular SQL?
jq -c 'select(.sent_sql_hash == "a3f29c61b8de2014")' .signalforge/llm_responses.jsonl
# How much did cache reads save on the last 50 calls?
jq -s 'sort_by(.timestamp) | .[-50:] | map(.cache_read_input_tokens) | add' \
  .signalforge/llm_responses.jsonl

Prompt-injection mitigation

The dynamic block wraps the model SQL in a <MODEL_SQL> envelope and the system message instructs the model that anything between the tags is data, not instructions (DEC-007):

<MODEL_SQL>
SELECT customer_id, ... FROM ...
-- adversarial dbt comment: "ignore prior instructions and ..."
</MODEL_SQL>

This protects against the most common attack surface: a malicious dbt project committing a -- prompt injection comment in model.sql that attempts to override the system prompt.

What it does not protect against:

The LLM hallucinating columns that don't exist on the model. The anchor-contract validator (see §8) catches this — every test that references a column must point at a real column on the input model, or the whole draft is rejected.
Adversarial column descriptions in the manifest. Column descriptions, tags, and meta fields go through to the LLM verbatim inside the cached manifest summary. Column names are hashed in schema-only mode by the safety layer (col_<8 hex>), but column descriptions are passthrough. Treat manifest content as semi-trusted: a malicious description can attempt the same prompt injection as a SQL comment, and the <MODEL_SQL> envelope does not cover that surface.

A v0.2 follow-up may add a manifest-content envelope with the same treatment; for v0.1 the practical mitigation is "review your manifest descriptions like you review your SQL."

Custom business-rule tests (`custom_sql`)

The four schema-test types (not_null, unique, accepted_values, relationships) cover the structural invariants dbt's generic catalogue can express. They cannot express a business rule — "a refund's amount never exceeds the original order," "every shipped order has a ship date," "discount percent stays between 0 and 100." The fifth test variant, custom_sql (issue #116), is the escape hatch: a free-form singular SQL test the drafter authors per dbt's singular-test convention.

What a `custom_sql` test is

CandidateTestCustomSQL carries:

sql — the complete failing-rows SELECT. Per dbt's singular-test contract, the query returns the rows that violate the rule: a non-empty result means the test failed. (Zero rows = pass.) This mirrors how the four built-in variants compile to an inner failing-rows SELECT — the prune adapter wraps every test in SELECT COUNT(*) AS failures FROM (<sql>) AS t.
column — optional. A non-empty string scopes the rule to one column (and the test renders as test.column.<col>.custom_sql in the diff / grade artifact ids); null (the default) marks a model-level business-rule assertion.
rationale — optional one-line "why," surfaced in the diff.

Like the four built-ins, custom_sql can be excluded via DraftConfig.exclude_tests — it is a member of VALID_TEST_TYPES (US-021). When "custom_sql" is in exclude_tests, _render_system_prompt omits its JSON-shape illustration and drops it from the ### SCOPE line, so a cooperative LLM never proposes one; if the LLM defies that, the parser's anchor-contract check rejects the custom_sql candidate (defence in depth — prompt + parser). Otherwise the system prompt appends the custom_sql JSON-shape illustration after the (possibly filtered) four standard entries.

Authoring rules via `meta.signalforge.business_rules`

You steer the drafter toward specific rules by declaring them in your dbt model's meta. The drafter reads meta.signalforge.business_rules at both the column level (columns[*].meta.signalforge) and the model level (config.meta.signalforge). The value accepts two shapes:

A single natural-language string:

models:
  - name: dim_customers
    config:
      meta:
        signalforge:
          business_rules: "lifetime_value must never be negative"
    columns:
      - name: discount_pct
        meta:
          signalforge:
            business_rules: "discount_pct stays between 0 and 100 inclusive"

A list of rules:

config:
  meta:
    signalforge:
      business_rules:
        - "total_amount equals the sum of line-item amounts"
        - "every order with status='shipped' has a non-null ship_date"

When any rules are present, the drafter renders a ## BUSINESS RULES section into the prompt's data block (model-level rules first, then per-column rules, columns sorted by name for byte-stable prompts; duplicate rule strings are de-duplicated) and instructs the LLM to draft one custom_sql test per stated rule, translating each natural-language rule into a failing-rows SELECT.

Business-rule reading is best-effort, never fail-loud. A business_rules value that isn't a str or list (a number, a dict, None) yields no rules — the drafter still runs, and the inferred-fallback path below covers the gap. Whitespace-only strings collapse to nothing, so an empty meta value emits no section.

Numbered envelope shape (`<BUSINESS_RULE id="N">…</BUSINESS_RULE>`)

As of #163, each rule renders inside a numbered envelope rather than a bare bullet:

## BUSINESS RULES

Operator-supplied business rules for this model. Draft one custom_sql
test per rule below, using the rule ID as a reference:

<BUSINESS_RULE id="1">
  (model) total_amount must never be negative
</BUSINESS_RULE>
<BUSINESS_RULE id="2">
  (column discount_pct) discount_pct stays between 0 and 100 inclusive
</BUSINESS_RULE>

IDs start at 1; bodies are indented 2 spaces and carry the existing (model) / (column X) scope prefix. The envelope gives the LLM unambiguous reference targets and parallels the existing <MODEL_SQL> fence around the model's raw SQL.

Envelope-breach guard. A rule body containing the literal </BUSINESS_RULE> substring would terminate the fence early and let downstream content escape the data block. Before rendering, the drafter scans every rule for that exact substring (boring substring match — no whitespace / case normalisation, mirrors the </MODEL_SQL> precedent) and raises PromptEnvelopeBreachError(envelope="BUSINESS_RULE", rule_index=N) if found. The opening tag <BUSINESS_RULE> alone is fine (only the closing tag breaks the fence), as is any truncated fragment like `

LLM draft pipeline — operations guide

Overview

Public API surface

signalforge.draft.__all__

signalforge.llm.__all__

DraftOutcome shape

Response audit

Prompt-injection mitigation

Custom business-rule tests (custom_sql)

What a custom_sql test is

Authoring rules via meta.signalforge.business_rules

Numbered envelope shape (<BUSINESS_RULE id="N">…</BUSINESS_RULE>)