Enforce hard character caps on source and assembly inputs#20
Open
mattgodbolt-molty wants to merge 1 commit into
Open
Enforce hard character caps on source and assembly inputs#20mattgodbolt-molty wants to merge 1 commit into
mattgodbolt-molty wants to merge 1 commit into
Conversation
MAX_CODE_LENGTH (10k chars) and MAX_ASM_LENGTH (20k chars) existed as constants in explain.py but were never enforced — they were dead code. Without enforcement, pathological inputs (e.g. a 108k-token asm blob with a handful of very long lines that slip under the 300-line selection limit) could reach Claude with no size guard, inflating TTFT, cost, and crucially the wall-clock response time that causes API Gateway 503s. Move the constants to prompt.py (where input preparation lives) and enforce them in prepare_structured_data(): - Source code is hard-truncated to MAX_CODE_LENGTH with a visible marker - Assembly is char-capped after line-based selection via cap_assembly_chars() which appends an omission marker and sets truncated=True Neither change affects the line-based assembly selection logic. Adds three tests covering the new truncation paths and the pass-through case. 🤖 Generated by LLM (Claude, via OpenClaw)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)
Problem
MAX_CODE_LENGTH(10k chars) andMAX_ASM_LENGTH(20k chars) existed as constants inexplain.pybut were never actually applied to the data passed to Claude — they were dead code. The only guard was a 300-line limit on assembly, which doesn't bound line length.This meant pathological inputs — e.g. a small number of very long assembly lines — could reach Claude with 100k+ input tokens, inflating TTFT, cost, and critically the wall-clock response time that triggers API Gateway 503s (the gateway has a hard 30s integration timeout).
Fix
Move the constants to
prompt.py(where input preparation lives) and actually enforce them inprepare_structured_data():MAX_CODE_LENGTHchars with a visible... (N characters truncated) ...markercap_assembly_chars()helper, which appends an omission marker item and setstruncated=TrueNeither change affects the existing 300-line assembly selection logic — this is a second layer of protection for the character dimension.
Testing
Three new tests in
TestPrepareStructuredData:🤖 Generated by LLM (Claude, via OpenClaw)