spike(perf): pipeline reference-returning calls via client-allocated object refs#5168
Draft
mrgrain wants to merge 1 commit into
Draft
spike(perf): pipeline reference-returning calls via client-allocated object refs#5168mrgrain wants to merge 1 commit into
mrgrain wants to merge 1 commit into
Conversation
…object refs Proof-of-concept (prototyped by Kiro, an AI agent) for overlapping the synchronous host<->kernel round-trips that dominate large CDK synths. The protocol is a strict request/response ping-pong: the host blocks on every create/invoke/get. Profiling showed each side idle ~half the time waiting for the other, with serialization (JSON) at ~2% -- so the lever is overlapping round-trips, not the wire format. Mechanism (host-language agnostic; implemented here for Python): - Calls whose declared return is a reference type (class/interface) are fired without waiting: the host mints the object id, sends it with the request, and uses a synthetic handle immediately. - The kernel honors a client-supplied `objid` on create, and on invoke/sinvoke/get/sget aliases the client id to whatever object the call actually produced (fresh or pre-existing) -- so identity on the kernel side is correct regardless. - The host drains acks lazily and only blocks at true sync points: value returns (e.g. reading a token string), callbacks, end of synth. A pending cap bounds outstanding requests to avoid pipe-buffer deadlock. The Python client determines ref-vs-value by reading the generated binding's return annotation at runtime (a POC shortcut; production should thread the return fqn through pacmak codegen instead). Measured on a ~2,200-resource synth: ~17-20% faster wall time, with byte-identical CloudFormation output. create-only pipelining was ~0% (interleaved synchronous invokes are barriers); pipelining invokes too is what unlocks the win. The remaining gap to the theoretical ceiling is the kernel's serial work, not the protocol. Kernel changes are production-quality (244 tests + 96 snapshots pass). The Python client changes are POC-quality with known gaps (see PR).
Contributor
|
The title of this Pull Request does not conform with [Conventional Commits] guidelines. It will need to be adjusted before the PR can be merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Overlaps the synchronous host↔kernel round-trips that dominate large CDK synths, by pipelining reference-returning calls via client-allocated object ids. Instead of blocking on every
create/invoke/get, the host mints the result's object id, fires the request, and uses a synthetic handle immediately; the kernel aliases that id to whatever the call produces. The host only blocks at true sync points (value returns, callbacks, end of synth).On a ~2,200-resource synth this is ~17–20% faster wall time with byte-identical CloudFormation output. (JSON serialization was measured at ~2% — the win is from overlap, not the wire format; see #5166.)
Outcome
Loading and the kernel hot path aside, the protocol round-trip is the last big synth cost. This POC demonstrates it's reclaimable: create-only pipelining was ~0% (interleaved synchronous invokes are barriers), but pipelining invokes/sinvokes/sget as well unlocks the win. The remaining gap to the theoretical ~2× ceiling is the kernel's serial work, not the protocol.
What's in this PR
CreateRequest.objidis honored instead of allocating;invoke/sinvoke/get/sgetaccept anobjidandObjectTable.aliasObjectbinds it to the produced object (fresh or pre-existing). Fully backward-compatible — inert whenobjidis absent.Known gaps (POC shortcuts — full list in #5167)
get_type_hints; production should thread the return fqn through pacmak codegen instead.Testing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.