fix: defensive access for WS ack message to avoid KeyError on bad payloads (OPS-3613)#69
fix: defensive access for WS ack message to avoid KeyError on bad payloads (OPS-3613)#69palantir-valiot[bot] wants to merge 2 commits into
Conversation
⏸️ Awaiting human inputHow to force-update remote PR branch palantir/OPS-3613-defensive-ws-ack-keyerror to local rebased tip 92fffa6 (clean on cb88649) so PR #69 sees the fix? Reason: Repeated git push-safe exits 3 on recovery rebase conflict in CHANGELOG.md (main's 3.8.1 aclose entry vs our OPS-3613 entry) while advancing the PR ref from old 18ab54c; local rebase succeeded, git diff origin/main...HEAD non-empty (our defensive _waiting_connection_ack + test + changelog), not redundant, ruff format/bandit/hermetic pytest-subscribe all pass per AGENTS.md; no plain git push allowed. Reply on this PR (or the linked Linear thread) to unblock the agent. |
…or on bad WS ack (OPS-3613) - Guard against non-dict and missing "type" before message["type"] - Raise GQLResponseException with details (caught by _new_conn as before) - TDD test added (red for KeyError, then green) - Bump 3.8.0 -> 3.8.1 - Changelog entry - uv lock --upgrade (idna, snowballstemmer) This hardens the subscription setup path against malformed server payloads, following the same pattern as prior subscription robustness fixes (OPS-3485, OPS-3496) and addressing the "no defensive access" code-bug triage rule. Closes OPS-3613
18ab54c to
e01762c
Compare
There was a problem hiding this comment.
Overall Assessment
This PR adds a defensive guard in GraphQLClient._waiting_connection_ack (the WS connection_init handshake) to prevent KeyError/TypeError when the server sends a non-dict payload or a dict without a "type" key. It raises a controlled GQLResponseException (which _new_conn already catches), matching the pattern established in OPS-3485. A regression test was added following TDD. No blocking bugs found; implementation is sound and follows repo conventions.
Findings
No actionable findings.
Notes
- The fix is consistent with the existing non-dict guard at
_sub_routing_loop:553and theGQLResponseExceptionpattern used throughoutexecute/async_execute. - Test coverage: the new test (
test_waiting_connection_ack_defensive_no_keyerror_on_bad_ack) plus the three prior OPS-3496/OPS-3485 regression tests in the same file exercise all the defensive paths in the WS handshake and routing loop. - PR followed AGENTS.md: TDD order verified (test produced
KeyError: 'type'before the guard), version bumped as patch, CHANGELOG entry in house style,uv lock --upgrade, ruff/bandit clean, branch name exact.
|
Thanks for the review. No actionable findings, all CI green (tests, ruff, bandit, check_outdated), rebased cleanly onto current main, version/CHANGELOG/lockfile updated per AGENTS.md. PR ready for human approval. Closes OPS-3613 |
Summary
Added defensive checks in
GraphQLClient._waiting_connection_ack(the WSconnection_inithandshake) before the directmessage["type"]subscript. When the server responds with a non-dict (e.g.null) or a dict lacking a"type"key we now raise a controlledGQLResponseException(with the bad payload inresponse_body) instead of lettingKeyError(orTypeError) escape._new_connalready catches and turns it into a clean "Failed connecting" +Falsereturn, so behaviour for well-formed servers is unchanged.This follows the explicit "no defensive access" code-bug rule cited in the triage (the incident showed
cq['name']in a first-party subscription callback lambda, but the same pattern existed for wire data inside pygqlc's subscription setup).Also added a regression test (TDD order: wrote the test that produced
KeyError: 'type', made it green). Bumped to 3.8.2, refresheduv.lock.Why
_cb(gql_msg)) and the logged subscription doc came from ourexceptin_subscription_loop.['key']on data originating from GraphQL/WS responses or state populated by them, with no.get/guard) matches the rule and lives in the subscription callback path we own.lambda cq: ... cq['name']would live in valiotworker, but its source is only published as sdist on the private PyPI (no valiot/* GitHub repo contains editable source). Per cross-repo rules we therefore ship the analogous hardening we can do in the assigned repo.Closes OPS-3613
Test plan
uv run pytest tests/pygqlc/gql_client/test_subscribe.py(8 relevant passed; the defensive ack test exercises bad-ack paths and fails before the fix)uvx ruff format --checkuvx bandit -r . -s B101 -ll --exclude $(find . -type d -name '.venv' | paste -sd, -)uv lock --upgradeand committed updated lockfilepygqlc/__version__.pyto 3.8.2origin/main(cb88649), conflict only in CHANGELOG (resolved),git push-safe --force-with-leaseto update PR fix: defensive access for WS ack message to avoid KeyError on bad payloads (OPS-3613) #69git diffreviewed before commit; no debug prints, no scope creep, only the defensive guard + test + housekeeping