fix(deepgram): recover Flux STT WebSocket closes#1813
Conversation
🦋 Changeset detectedLatest commit: 5493712 The changes in this PR will be included in the next version bump. This PR includes changesets to release 35 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| if (hasEnded) { | ||
| this.#audioDurationCollector.flush(); | ||
| break; | ||
| } |
There was a problem hiding this comment.
📝 Info: Behavioral change: FLUSH_SENTINEL with empty AudioByteStream no longer terminates the stream
The old sendTask had a subtle behavior where receiving FLUSH_SENTINEL when the AudioByteStream buffer was empty would cause hasEnded to stay true (because the frame loop never executed to reset it), leading to an early break and sending CloseStream. The new code at lines 355-358 unconditionally continues after a flush, regardless of whether any frames were produced. This is actually the correct behavior — flush() should flush buffered data, not terminate the stream. The old behavior was arguably a latent bug where calling stream.flush() with no buffered audio would unexpectedly close the connection. This change is intentional and correct.
Was this helpful? React with 👍 or 👎 to provide feedback.
| @@ -398,13 +403,29 @@ class SpeechStreamv2 extends stt.SpeechStream { | |||
| } | |||
There was a problem hiding this comment.
🚩 Deepgram API errors (type=Error) are silently swallowed by the message handler's try/catch
Pre-existing issue: #processStreamEvent throws at plugins/deepgram/src/stt_v2.ts:472 when Deepgram sends an error-type message, but this throw is caught by the try/catch at plugins/deepgram/src/stt_v2.ts:398-403 which logs it as 'Failed to parse Deepgram message'. The error is effectively swallowed — the stream continues as if nothing happened, and the new retry mechanism won't trigger for Deepgram API errors (only for unexpected WebSocket closes). This is not introduced by this PR but is impacted by it: the PR ensures unexpected closes trigger retries, but Deepgram error messages still don't.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Thanks, this is a real pre-existing issue, but I am leaving it out of this PR to keep the fix scoped to unexpected WebSocket close recovery. Handling Deepgram type: "Error" messages needs a separate decision about how provider error payloads should map onto APIConnectionError vs APIStatusError and which cases should be retryable. I would prefer to handle that in a follow-up so this PR does not broaden retry semantics beyond the observed unrecovered close path.
After an unexpected reconnect the fresh Deepgram socket restarts audio_window at 0, but startTimeOffset was left unchanged, so post-reconnect transcripts were timestamped near the start of the session. Downstream consumers that gate on absolute timing (e.g. drop finals that land before the current answer's audio) would discard them, so long sessions could still stall after the first reconnect. Track the audio already streamed (#sentAudioSec) and snapshot it as a per-connection base at each connect, offsetting that connection's window times by it so transcript timestamps stay monotonic across reconnects. Adds a regression test asserting the second turn's final timestamp does not reset below the first across an unexpected reconnect.
|
Pushed two follow-up commits for the Flux reconnect path:
The reason is that reconnecting the WebSocket is necessary but not quite sufficient for Flux. A fresh Deepgram Flux socket restarts its provider-side The implementation now tracks consumed audio seconds and snapshots that value when each socket opens. Transcript timestamps use I also ran a local live harness against real Deepgram Flux: it streamed the repo speech fixture through This intentionally does not change the base retry lifetime cap ( |
Summary
Deepgram Flux streaming STT now recovers from unexpected WebSocket closes instead of permanently ending the speech stream. Previously
SpeechStreamv2.#recvTask()resolved every WebSocketcloseevent, so a provider-side close such as code1005looked like normal stream completion:run()broke out, no retryable error escaped, and the baseSpeechStream.mainTasknever reconnected.This matches the Python Flux implementation in
livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt_v2.py, which treats local/session shutdown as expected and raises on unexpected Deepgram closes so the base retry path reconnects.What Changed
APIConnectionError, lettingSpeechStream.mainTaskreconnect according to existingconnOptions.maxRetry: 0.Retry Semantics Caveat
This uses the existing base stream retry policy.
maxRetrydefaults to 3 and the counter is lifetime-scoped for a stream, not reset after a healthy connection. For very long sessions with multiple provider-side closes, maintainers may want to decide whether to keep the Python-like base retry behavior, reset retry count after a healthy interval, or expose a dedicated configuration.Testing
pnpm test plugins/deepgram/src/stt_v2.test.tspassed (3 local tests, 1 live Deepgram test skipped withoutDEEPGRAM_API_KEY).pnpm --filter @livekit/agents-plugin-deepgram lintpassed with existing warnings.pnpm --filter @livekit/agents-plugin-deepgram... buildpassed.pnpm lintpassed with existing warnings.pnpm buildpassed.pnpm test agents/src/utils.test.tspassed.pnpm testcurrently fails outside this change due provider/API-key-dependent tests (CEREBRAS_API_KEY, Google API key, OpenAI drive-thru examples), a HuggingFace download timeout inplugins/livekit/src/hf_utils.test.ts, and existing FakeLLM unhandled errors inexamples/src/testing/survey_agent.test.ts.Post-Deploy Monitoring & Validation
Deepgram WebSocket closed unexpectedly,failed to recognize speech, retrying,deepgram.SpeechStreamv2,stt_error, andagent response timed out.failed to recognize speech after 4 attempts, recurring unrecoverablestt_error, or continued turn-detection stalls after a Deepgram close.STTv2withturnDetection: 'stt'.