avoid quadratic multiline-string context scan in is_line_short_enough#5188
Open
sahvx655-wq wants to merge 1 commit into
Open
avoid quadratic multiline-string context scan in is_line_short_enough#5188sahvx655-wq wants to merge 1 commit into
sahvx655-wq wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
While profiling Black on a generated config module I noticed formatting hung for several seconds on a single dict literal whose values were all triple-quoted strings. A flame graph put ~80% of the wall time in
is_line_short_enough, specifically thewhile str(ctx) in line_strloop that walks the multiline string up through its enclosing AST nodes. Each step renders the whole ancestor subtree to a string and substring-searches the line, so when the topmost ancestor is the entire collection that render is O(N), and becauseis_line_short_enoughruns once per candidate split line during line generation the whole pass is O(N²). A dict with 1600 triple-quoted values took ~2.2s; the cost grew quadratically with the number of members.The walk only needs the ancestors that are wholly present on the current line, and since a line is a contiguous run of leaves an ancestor qualifies exactly when its first and last leaves both belong to the line. Checking two leaf ids against a set built once is O(depth) per step and bails out immediately for ancestors that aren't fully on the line, so the render-and-search disappears and the pass becomes linear (the 1600-value case drops to ~0.23s). Keeping the check here rather than at the call sites means every caller of
is_line_short_enoughbenefits without having to special-case multiline strings. Output is unchanged: byte-identical across the test suite, the Black and stdlib sources in stable/preview, and several thousand fuzzed multiline-string structures.Checklist - did you ...
--previewstyle, following the stability policy?CHANGES.mdif necessary?