fix(memory): resolve claude on PATH for backfill + surface per-session failures (#276)#277
Conversation
…n failures `hivemind memory backfill` staged 0/N with a bare "N failed" and no way to see why (#276). Root cause (most likely): the stager resolved the claude binary via findAgentBin's hard-coded candidate list, which has no PATH walk and falls back to ~/.claude/local/claude. When claude lives elsewhere (e.g. an nvm/npm-global bin), the fallback doesn't exist, every spawn ENOENTs, and each session is recorded as `claude-failed`. The live wiki-worker already uses the PATH-aware resolveCliBin; align backfill with it and reuse buildClaudeInvocation so the Windows `.cmd` shim path is handled too. Surfacing: stageSession already computed a precise `reason` (claude-failed / no-summary / empty-summary / jsonl-missing / mkdir-failed) but it was dropped at the StageFn boundary. Carry it through, tally per reason, record per-session detail, always print the breakdown, and add --verbose to list each failing session. A bare "N failed" is now self-diagnosable. Tests + coverage: extract pure planClaudeSpawn / summarizeExtract / renderFailures / releaseBackfillLock units; cover the Windows stdin spawn and the real defaultEmbed enabled/disabled branches via isolated mocks. backfill-memory.ts and stage-memory.ts both ≥90% on every metric; raise their branch thresholds 85→90.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesBackfill verbose failure reporting and Windows spawn fix
Sequence Diagram(s)sequenceDiagram
participant CLI as runBackfillMemory
participant runExtract as runExtract(plan, cwd, opts)
participant executeBackfill as executeBackfill worker
participant stageSession as stageSession / stage()
participant summarizeExtract as summarizeExtract()
participant renderFailures as renderFailures()
CLI->>runExtract: plan, cwd, opts (verbose)
runExtract->>executeBackfill: iterate sessions
loop per session
executeBackfill->>stageSession: stage(session)
alt ok: false
stageSession-->>executeBackfill: { ok:false, reason }
executeBackfill->>executeBackfill: recordFailure(reason)
else throws
stageSession-->>executeBackfill: throws Error
executeBackfill->>executeBackfill: recordFailure("threw: <msg>")
end
end
executeBackfill-->>runExtract: ExtractSummary (failureReasons, failures)
runExtract->>summarizeExtract: result, opts.verbose
summarizeExtract->>renderFailures: result, verbose
renderFailures-->>summarizeExtract: failure lines
summarizeExtract-->>runExtract: { lines, exitCode }
runExtract-->>CLI: exitCode
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 3 files changed
Generated for commit 1f0bf9d. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/claude-code/backfill-memory.test.ts (1)
297-299: ⚡ Quick winPrefer exact output assertions over substring checks in these tests.
Several checks here rely on generic
toContain(...)matches. Please assert exact expected lines/messages (or exact rendered arrays) so regressions in wording/structure are caught deterministically.As per coding guidelines, "tests/**: Prefer asserting on specific values (paths, messages) over generic substrings."
Also applies to: 321-323, 328-329, 360-360
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/claude-code/backfill-memory.test.ts` around lines 297 - 299, The test assertions in the backfill-memory.test.ts file use generic toContain() substring matching (at lines 297-299, 321-323, 328-329, and 360), which won't catch regressions in output wording or structure. Replace these generic substring checks with exact assertions that verify specific complete lines or the exact rendered output format. For example, instead of checking if the joined output contains a substring like "--verbose", assert that a specific expected line appears in the output array, or verify the complete output matches an expected format exactly.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/commands/backfill-memory.ts`:
- Around line 339-343: The try-finally block currently only wraps the runExtract
call, but the backfill lock is acquired before this block in the planBackfill
and renderPlan operations. If either of those operations throws an error, the
lock will not be released. Restructure the code by moving the beginning of the
try block to encompass the planBackfill and renderPlan calls as well, so that
the releaseBackfillLock call in the finally block will properly execute and
release the lock regardless of whether the failure occurs during planning,
rendering, or execution.
In `@tests/claude-code/stage-memory-shell-spawn.test.ts`:
- Line 90: Replace the generic substring assertion toContain("from stdin") with
an exact content assertion using toBe() in the readFileSync expectation on line
90. Since the file content is deterministic, assert the complete expected file
content rather than just checking if it contains a substring. Capture the full
expected markdown content that should be written to the claude_code-s1.md file
and compare it exactly.
---
Nitpick comments:
In `@tests/claude-code/backfill-memory.test.ts`:
- Around line 297-299: The test assertions in the backfill-memory.test.ts file
use generic toContain() substring matching (at lines 297-299, 321-323, 328-329,
and 360), which won't catch regressions in output wording or structure. Replace
these generic substring checks with exact assertions that verify specific
complete lines or the exact rendered output format. For example, instead of
checking if the joined output contains a substring like "--verbose", assert that
a specific expected line appears in the output array, or verify the complete
output matches an expected format exactly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cee94838-18f8-4bf8-9b52-e1775d34406a
📒 Files selected for processing (9)
src/cli/index.tssrc/commands/backfill-memory.tssrc/skillify/stage-memory.tstests/claude-code/backfill-memory.test.tstests/claude-code/stage-memory-embed.test.tstests/claude-code/stage-memory-shell-spawn.test.tstests/claude-code/stage-memory.test.tstests/cli/cli-index.test.tsvitest.config.ts
…rings CodeRabbit review on #277: - Widen runBackfillMemory's try so the finally always releases the lock — a planBackfill/renderPlan throw must not strand a worker-owned lock. Safe: releaseBackfillLock no-ops unless this process owns the lock, so the dry-run early return under it is unaffected. - Add docstrings to the exported backfill/stage-memory functions and interfaces to satisfy the docstring-coverage pre-merge check.
CodeRabbit review on #277: replace toContain with an exact toBe match on the deterministic summary content.
Fixes #276 —
hivemind memory backfillstages0/N (0 embedded, N failed)with no surfaced reason.What was wrong
1. The failure reason was swallowed (the explicit ask)
stageSessionalready computes a precise reason for every outcome —claude-failed,no-summary,empty-summary,jsonl-missing,mkdir-failed— but it was dropped at theStageFnboundary (defaultStageFnreturned only{ok, embedded}) andexecuteBackfilljust didsummary.failed++. So the CLI could only print a bareN failed.Now the reason is carried through, tallied per-reason, recorded per-session. The report always prints the breakdown, and
--verbose/-vlists each failing session. A thrown stager is captured asthrew: <message>.2. Root cause: claude binary not found
The stager resolved claude via
findAgentBin("claude_code")— a hard-coded candidate-list resolver with no PATH walk, falling back to~/.claude/local/claude. When claude lives off that list (e.g. an nvm/npm-global bin — the reporter's macOS+nvm setup), the fallback doesn't exist → everyspawnENOENTs → every session isclaude-failed. The live wiki-worker uses the PATH-awareresolveCliBin("claude"); this aligns backfill with it and reusesbuildClaudeInvocationfor the Windows.cmdpath. (stage-memory.tsis in the CLI bundle, not the ClawHub-scanned openclaw worker, so the PATH-free constraint behindfindAgentBindoesn't apply.)Reproduced in isolation
Simulated the reporter's environment (claude on PATH but off every
findAgentBincandidate, isolatedHOMEso the fallback is absent) and drove the realstageSessionthrough both resolvers:The exact symptom (
0/N, 0 embedded, N failed) reproduced, the swallowed reason shown to beclaude-failed, and the fix flipping it to staged. (Repro manipulatesHOME/PATH/which— kept as throwaway verification, not a Windows-fragile committed test.)Caveat: this reproduces the claude-off-the-candidate-list case. If a user's claude IS at a candidate path, their failure is a different reason (e.g.
no-summary) — which the surfacing change now makes visible instead of swallowing.Tests & coverage
Pure units extracted (
planClaudeSpawn,summarizeExtract,renderFailures,releaseBackfillLock); isolated mock files cover the Windows stdin spawn path and the realdefaultEmbedenabled/disabled branches without spawning claude or an embed daemon.commands/backfill-memory.tsskillify/stage-memory.tsBoth >=90% on every metric; branch thresholds raised 85->90. Typecheck clean; targeted suites green (58 tests); full suite shows only pre-existing build-artifact failures (unchanged from baseline).
🤖 Generated with Claude Code
Summary by CodeRabbit
--verbosefor memory backfill with richer, per-session failure diagnostics.--window-days,--project-only) in the usage message.