Force-close exec WebSockets so the pod-side subprocess actually exits#375
Draft
jeanschmidt wants to merge 1 commit into
Draft
Force-close exec WebSockets so the pod-side subprocess actually exits#375jeanschmidt wants to merge 1 commit into
jeanschmidt wants to merge 1 commit into
Conversation
- Add `safeTerminateWs(ws)` helper in `heartbeat.ts` that wraps `ws.terminate()` in a try/catch and logs failures via `core.debug`. - Extend `HeartbeatWebSocket` interface with `terminate()` so the heartbeat and exec paths share one contract. - Replace `socket.close()` with `safeTerminateWs(socket)` in both `execPodStep` cleanup paths (status callback + exec error handler) in `packages/k8s/src/k8s/index.ts`. - Replace `ws.close()` with `safeTerminateWs(ws)` in the heartbeat pong-timeout path; update the heartbeat test to assert `terminate` (not `close`). - Add `tests/safe-terminate-ws-test.ts` covering all three cleanup paths plus the defensive try/catch. Notes: `ws.close()` performs a graceful WebSocket close handshake that the kubelet streaming server does not forward to the remote `kubectl exec` subprocess (kubernetes-client/javascript#2532). The pod-side `sh` keeps running until the pod itself dies, which OSDC observed as orphaned `tar` processes lingering until the kubelet OOM-killed the pod. `terminate()` skips the handshake and drops the underlying TCP connection, which the kubelet does propagate as a SIGHUP/SIGPIPE to the child. The heartbeat path matters even when callers also clean up: if the heartbeat fires `ws.close()` first, the socket transitions to `CLOSING` and the `execPodStep` catch-block guard skips its own cleanup, so the subprocess is orphaned anyway. Terminating in the heartbeat closes that gap.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates Kubernetes exec WebSocket cleanup to force-terminate connections (instead of graceful close) so the remote kubectl exec subprocess reliably exits, and adds/updates tests to cover the new behavior.
Changes:
- Added
safeTerminateWs()helper and updatedWebSocketHeartbeatto use termination on heartbeat timeout. - Updated
execPodStepcleanup/error paths to callsafeTerminateWs()instead ofsocket.close(). - Added new Jest coverage for terminate-based cleanup paths and updated existing heartbeat tests accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/k8s/src/k8s/heartbeat.ts | Adds terminate() to the WebSocket interface, introduces safeTerminateWs(), and uses it for pong timeout cleanup. |
| packages/k8s/src/k8s/index.ts | Switches exec cleanup paths from close() to safeTerminateWs() to force-close exec streams. |
| packages/k8s/tests/heartbeat-test.ts | Updates the heartbeat test WebSocket mock + expectations from close() to terminate(). |
| packages/k8s/tests/safe-terminate-ws-test.ts | Adds new tests verifying terminate() is used across execPodStep success/failure/error cleanup paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
3
to
10
| export interface HeartbeatWebSocket { | ||
| readyState: number | ||
| ping(): void | ||
| close(): void | ||
| terminate(): void | ||
| on(event: string, listener: (...args: any[]) => void): this | ||
| once(event: string, listener: (...args: any[]) => void): this | ||
| } |
Comment on lines
+15
to
+26
| export function safeTerminateWs( | ||
| ws: HeartbeatWebSocket | null | undefined | ||
| ): void { | ||
| if (!ws) { | ||
| return | ||
| } | ||
| try { | ||
| ws.terminate() | ||
| } catch (err) { | ||
| core.debug(`[safeTerminateWs] terminate() threw, ignoring: ${err}`) | ||
| } | ||
| } |
Comment on lines
+23
to
+25
| } catch (err) { | ||
| core.debug(`[safeTerminateWs] terminate() threw, ignoring: ${err}`) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Force-close exec WebSockets so the pod-side subprocess actually exits
Impact: k8s hook users running
kubectl exec-style steps — fixes orphaned subprocesses (e.g.tar,sh) that linger inside the runner pod after a step finishes.Risk: low
What
Replace
ws.close()with a smallsafeTerminateWs()helper (calls the underlyingws.terminate()inside a try/catch) at the three cleanup sites inexecPodStepandWebSocketHeartbeatwhere the WebSocket is being torn down because the step is done or the connection is already declared stale.Why
ws.close()initiates a graceful WebSocket close handshake. The kubelet streaming server does not forward that handshake to the remotekubectl execsubprocess — see kubernetes-client/javascript#2532. The pod-side process keeps running until the pod itself dies.In production this shows up as orphaned
tarprocesses left over fromcp-style steps, eventually getting OOM-killed by the kubelet.terminate()drops the TCP connection without a handshake, which the kubelet does propagate, so the child exits promptly.How
heartbeat.tsnext to theHeartbeatWebSocketinterface (avoids a circular import intoindex.ts) and addsterminate(): voidto that interface.execPodStepstatus-callback cleanup (success/failure)execPodStep.catch()error-handler cleanupWebSocketHeartbeatpong-timeout (heartbeat already declared the connection stale)ws.close()first, the socket transitions toCLOSINGand the downstream.catch()cleanup guard (readyState === 1 || readyState === 0) becomes false, so its own terminate is skipped and the subprocess is orphaned anyway.try/catcharoundterminate()so cleanup never throws on an already-dead socket.