Skip to content

spike(perf): pipeline reference-returning calls via client-allocated object refs#5168

Draft
mrgrain wants to merge 1 commit into
mainfrom
mrgrain/spike/client-allocated-objids
Draft

spike(perf): pipeline reference-returning calls via client-allocated object refs#5168
mrgrain wants to merge 1 commit into
mainfrom
mrgrain/spike/client-allocated-objids

Conversation

@mrgrain

@mrgrain mrgrain commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Draft / proof-of-concept — not for merge. Prototyped by Kiro (an AI agent), working with @mrgrain. Tracking issue: #5167. Analysis: #5166.

What

Overlaps the synchronous host↔kernel round-trips that dominate large CDK synths, by pipelining reference-returning calls via client-allocated object ids. Instead of blocking on every create/invoke/get, the host mints the result's object id, fires the request, and uses a synthetic handle immediately; the kernel aliases that id to whatever the call produces. The host only blocks at true sync points (value returns, callbacks, end of synth).

On a ~2,200-resource synth this is ~17–20% faster wall time with byte-identical CloudFormation output. (JSON serialization was measured at ~2% — the win is from overlap, not the wire format; see #5166.)

Outcome

Loading and the kernel hot path aside, the protocol round-trip is the last big synth cost. This POC demonstrates it's reclaimable: create-only pipelining was ~0% (interleaved synchronous invokes are barriers), but pipelining invokes/sinvokes/sget as well unlocks the win. The remaining gap to the theoretical ~2× ceiling is the kernel's serial work, not the protocol.

What's in this PR

  • Kernel (production-quality, tested — 244 tests + 96 snapshots pass): CreateRequest.objid is honored instead of allocating; invoke/sinvoke/get/sget accept an objid and ObjectTable.aliasObject binds it to the produced object (fresh or pre-existing). Fully backward-compatible — inert when objid is absent.
  • Python client (POC-quality): fire-and-forget for reference-returning calls, lazy ack draining with a pending cap, and a runtime read of the binding's return annotation to decide ref-vs-value.

Known gaps (POC shortcuts — full list in #5167)

  • Return type is read via runtime frame-walk + get_type_hints; production should thread the return fqn through pacmak codegen instead.
  • Errors are deferred to the next sync point and currently ignored on the drained path (need request tagging + stack mapping).
  • Callbacks mid-pipeline are unsupported (asserts; the test app has none).
  • Object identity: repeated reads mint distinct host proxies (kernel-side identity preserved; host-side needs a story).
  • Python-only; needs id-space discipline, handshake capability negotiation, backpressure tuning, and the other language runtimes.

Testing

  • Kernel: full suite green (244 + 96 snapshots).
  • End-to-end: ~2,200-resource CDK Python synth produces byte-identical templates vs baseline; ~17–20% faster (best-of-5).
  • The Python client changes are POC-quality and not yet covered by the python-runtime test suite.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…object refs

Proof-of-concept (prototyped by Kiro, an AI agent) for overlapping the
synchronous host<->kernel round-trips that dominate large CDK synths.

The protocol is a strict request/response ping-pong: the host blocks on
every create/invoke/get. Profiling showed each side idle ~half the time
waiting for the other, with serialization (JSON) at ~2% -- so the lever
is overlapping round-trips, not the wire format.

Mechanism (host-language agnostic; implemented here for Python):
- Calls whose declared return is a reference type (class/interface) are
  fired without waiting: the host mints the object id, sends it with the
  request, and uses a synthetic handle immediately.
- The kernel honors a client-supplied `objid` on create, and on
  invoke/sinvoke/get/sget aliases the client id to whatever object the
  call actually produced (fresh or pre-existing) -- so identity on the
  kernel side is correct regardless.
- The host drains acks lazily and only blocks at true sync points: value
  returns (e.g. reading a token string), callbacks, end of synth. A
  pending cap bounds outstanding requests to avoid pipe-buffer deadlock.

The Python client determines ref-vs-value by reading the generated
binding's return annotation at runtime (a POC shortcut; production should
thread the return fqn through pacmak codegen instead).

Measured on a ~2,200-resource synth: ~17-20% faster wall time, with
byte-identical CloudFormation output. create-only pipelining was ~0%
(interleaved synchronous invokes are barriers); pipelining invokes too is
what unlocks the win. The remaining gap to the theoretical ceiling is the
kernel's serial work, not the protocol.

Kernel changes are production-quality (244 tests + 96 snapshots pass).
The Python client changes are POC-quality with known gaps (see PR).
@mergify mergify Bot added the contribution/core This is a PR that came from AWS. label Jun 16, 2026
@mergify

mergify Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

The title of this Pull Request does not conform with [Conventional Commits] guidelines. It will need to be adjusted before the PR can be merged.
[Conventional Commits]: https://www.conventionalcommits.org

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution/core This is a PR that came from AWS.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant