Enforce hard character caps on source and assembly inputs by mattgodbolt-molty · Pull Request #20 · compiler-explorer/explain

mattgodbolt-molty · 2026-06-15T16:47:41Z

(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)

Problem

MAX_CODE_LENGTH (10k chars) and MAX_ASM_LENGTH (20k chars) existed as constants in explain.py but were never actually applied to the data passed to Claude — they were dead code. The only guard was a 300-line limit on assembly, which doesn't bound line length.

This meant pathological inputs — e.g. a small number of very long assembly lines — could reach Claude with 100k+ input tokens, inflating TTFT, cost, and critically the wall-clock response time that triggers API Gateway 503s (the gateway has a hard 30s integration timeout).

Fix

Move the constants to prompt.py (where input preparation lives) and actually enforce them in prepare_structured_data():

Source code: hard-truncated to MAX_CODE_LENGTH chars with a visible ... (N characters truncated) ... marker
Assembly: char-capped after line-based selection via a new cap_assembly_chars() helper, which appends an omission marker item and sets truncated=True

Neither change affects the existing 300-line assembly selection logic — this is a second layer of protection for the character dimension.

Testing

Three new tests in TestPrepareStructuredData:

Oversized source is capped with a marker
A few very long asm lines (under the line limit) are still char-capped
Normal inputs pass through untouched

🤖 Generated by LLM (Claude, via OpenClaw)

MAX_CODE_LENGTH (10k chars) and MAX_ASM_LENGTH (20k chars) existed as constants in explain.py but were never enforced — they were dead code. Without enforcement, pathological inputs (e.g. a 108k-token asm blob with a handful of very long lines that slip under the 300-line selection limit) could reach Claude with no size guard, inflating TTFT, cost, and crucially the wall-clock response time that causes API Gateway 503s. Move the constants to prompt.py (where input preparation lives) and enforce them in prepare_structured_data(): - Source code is hard-truncated to MAX_CODE_LENGTH with a visible marker - Assembly is char-capped after line-based selection via cap_assembly_chars() which appends an omission marker and sets truncated=True Neither change affects the line-based assembly selection logic. Adds three tests covering the new truncation paths and the pass-through case. 🤖 Generated by LLM (Claude, via OpenClaw)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce hard character caps on source and assembly inputs#20

Enforce hard character caps on source and assembly inputs#20
mattgodbolt-molty wants to merge 1 commit into
mainfrom
fix/enforce-input-char-caps

mattgodbolt-molty commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattgodbolt-molty commented Jun 15, 2026

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant