Skip to content

Force-close exec WebSockets so the pod-side subprocess actually exits#375

Draft
jeanschmidt wants to merge 1 commit into
actions:mainfrom
jeanschmidt:jeanschmidt/upstream-pr02-safe-terminate-ws
Draft

Force-close exec WebSockets so the pod-side subprocess actually exits#375
jeanschmidt wants to merge 1 commit into
actions:mainfrom
jeanschmidt:jeanschmidt/upstream-pr02-safe-terminate-ws

Conversation

@jeanschmidt

@jeanschmidt jeanschmidt commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Force-close exec WebSockets so the pod-side subprocess actually exits

Impact: k8s hook users running kubectl exec-style steps — fixes orphaned subprocesses (e.g. tar, sh) that linger inside the runner pod after a step finishes.
Risk: low

What

Replace ws.close() with a small safeTerminateWs() helper (calls the underlying ws.terminate() inside a try/catch) at the three cleanup sites in execPodStep and WebSocketHeartbeat where the WebSocket is being torn down because the step is done or the connection is already declared stale.

Why

ws.close() initiates a graceful WebSocket close handshake. The kubelet streaming server does not forward that handshake to the remote kubectl exec subprocess — see kubernetes-client/javascript#2532. The pod-side process keeps running until the pod itself dies.

In production this shows up as orphaned tar processes left over from cp-style steps, eventually getting OOM-killed by the kubelet. terminate() drops the TCP connection without a handshake, which the kubelet does propagate, so the child exits promptly.

How

  • Helper lives in heartbeat.ts next to the HeartbeatWebSocket interface (avoids a circular import into index.ts) and adds terminate(): void to that interface.
  • Applied at three sites, all of which are points where the WebSocket has either finished its job or is known-bad:
    1. execPodStep status-callback cleanup (success/failure)
    2. execPodStep .catch() error-handler cleanup
    3. WebSocketHeartbeat pong-timeout (heartbeat already declared the connection stale)
  • The heartbeat site matters for completeness: if heartbeat fires ws.close() first, the socket transitions to CLOSING and the downstream .catch() cleanup guard (readyState === 1 || readyState === 0) becomes false, so its own terminate is skipped and the subprocess is orphaned anyway.
  • try/catch around terminate() so cleanup never throws on an already-dead socket.
  • No new config knob — upstream's current behavior here is provably broken.

- Add `safeTerminateWs(ws)` helper in `heartbeat.ts` that wraps `ws.terminate()` in a try/catch and logs failures via `core.debug`.
- Extend `HeartbeatWebSocket` interface with `terminate()` so the heartbeat and exec paths share one contract.
- Replace `socket.close()` with `safeTerminateWs(socket)` in both `execPodStep` cleanup paths (status callback + exec error handler) in `packages/k8s/src/k8s/index.ts`.
- Replace `ws.close()` with `safeTerminateWs(ws)` in the heartbeat pong-timeout path; update the heartbeat test to assert `terminate` (not `close`).
- Add `tests/safe-terminate-ws-test.ts` covering all three cleanup paths plus the defensive try/catch.

Notes:
`ws.close()` performs a graceful WebSocket close handshake that the kubelet
streaming server does not forward to the remote `kubectl exec` subprocess
(kubernetes-client/javascript#2532). The pod-side `sh` keeps running until the
pod itself dies, which OSDC observed as orphaned `tar` processes lingering
until the kubelet OOM-killed the pod. `terminate()` skips the handshake and
drops the underlying TCP connection, which the kubelet does propagate as a
SIGHUP/SIGPIPE to the child.

The heartbeat path matters even when callers also clean up: if the heartbeat
fires `ws.close()` first, the socket transitions to `CLOSING` and the
`execPodStep` catch-block guard skips its own cleanup, so the subprocess is
orphaned anyway. Terminating in the heartbeat closes that gap.
Copilot AI review requested due to automatic review settings June 11, 2026 15:34
@jeanschmidt jeanschmidt marked this pull request as draft June 11, 2026 15:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates Kubernetes exec WebSocket cleanup to force-terminate connections (instead of graceful close) so the remote kubectl exec subprocess reliably exits, and adds/updates tests to cover the new behavior.

Changes:

  • Added safeTerminateWs() helper and updated WebSocketHeartbeat to use termination on heartbeat timeout.
  • Updated execPodStep cleanup/error paths to call safeTerminateWs() instead of socket.close().
  • Added new Jest coverage for terminate-based cleanup paths and updated existing heartbeat tests accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
packages/k8s/src/k8s/heartbeat.ts Adds terminate() to the WebSocket interface, introduces safeTerminateWs(), and uses it for pong timeout cleanup.
packages/k8s/src/k8s/index.ts Switches exec cleanup paths from close() to safeTerminateWs() to force-close exec streams.
packages/k8s/tests/heartbeat-test.ts Updates the heartbeat test WebSocket mock + expectations from close() to terminate().
packages/k8s/tests/safe-terminate-ws-test.ts Adds new tests verifying terminate() is used across execPodStep success/failure/error cleanup paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3 to 10
export interface HeartbeatWebSocket {
readyState: number
ping(): void
close(): void
terminate(): void
on(event: string, listener: (...args: any[]) => void): this
once(event: string, listener: (...args: any[]) => void): this
}
Comment on lines +15 to +26
export function safeTerminateWs(
ws: HeartbeatWebSocket | null | undefined
): void {
if (!ws) {
return
}
try {
ws.terminate()
} catch (err) {
core.debug(`[safeTerminateWs] terminate() threw, ignoring: ${err}`)
}
}
Comment on lines +23 to +25
} catch (err) {
core.debug(`[safeTerminateWs] terminate() threw, ignoring: ${err}`)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants