Skip to content

fix(memory): resolve claude on PATH for backfill + surface per-session failures (#276)#277

Merged
khustup2 merged 3 commits into
mainfrom
fix/backfill-surface-errors
Jun 18, 2026
Merged

fix(memory): resolve claude on PATH for backfill + surface per-session failures (#276)#277
khustup2 merged 3 commits into
mainfrom
fix/backfill-surface-errors

Conversation

@khustup2

@khustup2 khustup2 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Fixes #276hivemind memory backfill stages 0/N (0 embedded, N failed) with no surfaced reason.

What was wrong

1. The failure reason was swallowed (the explicit ask)

stageSession already computes a precise reason for every outcome — claude-failed, no-summary, empty-summary, jsonl-missing, mkdir-failed — but it was dropped at the StageFn boundary (defaultStageFn returned only {ok, embedded}) and executeBackfill just did summary.failed++. So the CLI could only print a bare N failed.

Now the reason is carried through, tallied per-reason, recorded per-session. The report always prints the breakdown, and --verbose/-v lists each failing session. A thrown stager is captured as threw: <message>.

2. Root cause: claude binary not found

The stager resolved claude via findAgentBin("claude_code") — a hard-coded candidate-list resolver with no PATH walk, falling back to ~/.claude/local/claude. When claude lives off that list (e.g. an nvm/npm-global bin — the reporter's macOS+nvm setup), the fallback doesn't exist → every spawn ENOENTs → every session is claude-failed. The live wiki-worker uses the PATH-aware resolveCliBin("claude"); this aligns backfill with it and reuses buildClaudeInvocation for the Windows .cmd path. (stage-memory.ts is in the CLI bundle, not the ClawHub-scanned openclaw worker, so the PATH-free constraint behind findAgentBin doesn't apply.)

Reproduced in isolation

Simulated the reporter's environment (claude on PATH but off every findAgentBin candidate, isolated HOME so the fallback is absent) and drove the real stageSession through both resolvers:

OLD findAgentBin   -> <home>/.claude/local/claude   exists: false
                      stageSession -> { ok:false, reason:"claude-failed" }   <- the reporter's "N failed"
NEW resolveCliBin  -> <home>/nvm-bin/claude          exists: true
                      stageSession -> { ok:true }                            <- staged

The exact symptom (0/N, 0 embedded, N failed) reproduced, the swallowed reason shown to be claude-failed, and the fix flipping it to staged. (Repro manipulates HOME/PATH/which — kept as throwaway verification, not a Windows-fragile committed test.)

Caveat: this reproduces the claude-off-the-candidate-list case. If a user's claude IS at a candidate path, their failure is a different reason (e.g. no-summary) — which the surfacing change now makes visible instead of swallowing.

Tests & coverage

Pure units extracted (planClaudeSpawn, summarizeExtract, renderFailures, releaseBackfillLock); isolated mock files cover the Windows stdin spawn path and the real defaultEmbed enabled/disabled branches without spawning claude or an embed daemon.

File Stmts Branch Funcs Lines
commands/backfill-memory.ts 99.3 96.6 94.7 99.2
skillify/stage-memory.ts 100 94.7 100 100

Both >=90% on every metric; branch thresholds raised 85->90. Typecheck clean; targeted suites green (58 tests); full suite shows only pre-existing build-artifact failures (unchanged from baseline).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added --verbose for memory backfill with richer, per-session failure diagnostics.
    • Memory backfill now supports additional optional backfill flags (--window-days, --project-only) in the usage message.
  • Bug Fixes
    • Improved Claude executable resolution to better prevent missing-binary failures.
    • Enhanced Windows-specific Claude invocation handling and safer lock-file cleanup.
  • Tests
    • Expanded backfill failure-reporting, exit-code, and lock-ownership coverage.
    • Added embedding and Windows stdin/shell staging test coverage.
    • Tightened coverage thresholds for memory backfill and staging.

…n failures

`hivemind memory backfill` staged 0/N with a bare "N failed" and no way to
see why (#276).

Root cause (most likely): the stager resolved the claude binary via
findAgentBin's hard-coded candidate list, which has no PATH walk and falls
back to ~/.claude/local/claude. When claude lives elsewhere (e.g. an
nvm/npm-global bin), the fallback doesn't exist, every spawn ENOENTs, and
each session is recorded as `claude-failed`. The live wiki-worker already
uses the PATH-aware resolveCliBin; align backfill with it and reuse
buildClaudeInvocation so the Windows `.cmd` shim path is handled too.

Surfacing: stageSession already computed a precise `reason`
(claude-failed / no-summary / empty-summary / jsonl-missing / mkdir-failed)
but it was dropped at the StageFn boundary. Carry it through, tally per
reason, record per-session detail, always print the breakdown, and add
--verbose to list each failing session. A bare "N failed" is now
self-diagnosable.

Tests + coverage: extract pure planClaudeSpawn / summarizeExtract /
renderFailures / releaseBackfillLock units; cover the Windows stdin spawn
and the real defaultEmbed enabled/disabled branches via isolated mocks.
backfill-memory.ts and stage-memory.ts both ≥90% on every metric; raise
their branch thresholds 85→90.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: dc6387b9-f039-403e-97c6-5009c8ae0342

📥 Commits

Reviewing files that changed from the base of the PR and between 28224f8 and c0f8adb.

📒 Files selected for processing (1)
  • tests/claude-code/stage-memory-shell-spawn.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/claude-code/stage-memory-shell-spawn.test.ts

📝 Walkthrough

Walkthrough

Adds --verbose flag and structured failure reporting (BackfillFailure, failureReasons, renderFailures, summarizeExtract) to the memory backfill pipeline. Fixes Claude binary resolution to use the PATH-aware resolveCliBin("claude") and introduces planClaudeSpawn for Windows .cmd stdin spawning. Test coverage and CLI help text are updated accordingly.

Changes

Backfill verbose failure reporting and Windows spawn fix

Layer / File(s) Summary
Verbose option and CLI help
src/cli/index.ts, src/commands/backfill-memory.ts
BackfillOptions adds verbose boolean flag parsed from --verbose/-v; CLI help text documents new backfill flags in the usage message.
Failure model and data contracts
src/commands/backfill-memory.ts
BackfillFailure interface introduced; ExtractSummary augmented with failureReasons (reason → count map) and failures (per-session entries); StageFn result extended with optional reason field.
Failure recording and reason propagation
src/commands/backfill-memory.ts
Default stage function forwards failure reason from stageSession into StageFn result; recordFailure helper increments counts and appends entries; worker loop captures reason on ok:false and normalizes thrown errors to "threw: ..." messages.
Lock generalization, runExtract export, and reporting helpers
src/commands/backfill-memory.ts
releaseBackfillLock accepts optional lockPath parameter; runBackfillMemory delegates to exported runExtract(plan, cwd, opts) under try/finally lock-release guard; renderFailures generates reason tallies and verbose lines; summarizeExtract assembles headline and diagnostics with exit code; runExtract prints summarized lines and returns exit code.
ClaudeSpawnPlan, planClaudeSpawn, and resolveClaudeBin
src/skillify/stage-memory.ts
ClaudeSpawnPlan and planClaudeSpawn translate buildClaudeInvocation output into concrete spawn() parameters with conditional stdin piping for Windows .cmd shims; runClaude rewired to use the plan; resolveClaudeBin switched to PATH-aware resolveCliBin("claude").
backfill-memory test coverage
tests/claude-code/backfill-memory.test.ts
Tests cover --verbose/-v parsing, executeBackfill failure reason propagation (explicit reason, unknown fallback, thrown errors), renderFailures output for tallies and verbose lines, summarizeExtract exit codes and diagnostics, runExtract empty-plan short-circuit, and releaseBackfillLock ownership-gated deletion.
planClaudeSpawn, shell-spawn, and embed tests
tests/claude-code/stage-memory.test.ts, tests/claude-code/stage-memory-shell-spawn.test.ts, tests/claude-code/stage-memory-embed.test.ts
Unit tests for planClaudeSpawn across Unix and Windows .cmd scenarios; end-to-end shell-spawn test using hoisted stdin reader; stageSession embedding persistence and opt-out via mocked EmbedClient and gating hooks.
Coverage thresholds and CLI test update
vitest.config.ts, tests/cli/cli-index.test.ts
Branch coverage thresholds raised to 90 for stage-memory.ts and backfill-memory.ts; expected stderr usage string updated for new backfill flags.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as runBackfillMemory
  participant runExtract as runExtract(plan, cwd, opts)
  participant executeBackfill as executeBackfill worker
  participant stageSession as stageSession / stage()
  participant summarizeExtract as summarizeExtract()
  participant renderFailures as renderFailures()

  CLI->>runExtract: plan, cwd, opts (verbose)
  runExtract->>executeBackfill: iterate sessions
  loop per session
    executeBackfill->>stageSession: stage(session)
    alt ok: false
      stageSession-->>executeBackfill: { ok:false, reason }
      executeBackfill->>executeBackfill: recordFailure(reason)
    else throws
      stageSession-->>executeBackfill: throws Error
      executeBackfill->>executeBackfill: recordFailure("threw: <msg>")
    end
  end
  executeBackfill-->>runExtract: ExtractSummary (failureReasons, failures)
  runExtract->>summarizeExtract: result, opts.verbose
  summarizeExtract->>renderFailures: result, verbose
  renderFailures-->>summarizeExtract: failure lines
  summarizeExtract-->>runExtract: { lines, exitCode }
  runExtract-->>CLI: exitCode
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • activeloopai/hivemind#250: Both PRs modify Claude CLI invocation for Windows by reusing shared resolveCliBin/buildClaudeInvocation logic—this PR adapts stage-memory.ts to the same PATH-aware binary resolution and spawn planning introduced for the wiki-worker.

  • activeloopai/hivemind#269: Main PR enhances the existing retro-backfill feature from PR #269 by extending src/commands/backfill-memory.ts failure aggregation/reporting (verbose, failureReasons, StageFn.reason, renderFailures/summarizeExtract) and by updating src/skillify/stage-memory.ts spawn planning + resolveClaudeBin resolution behavior—both directly touching the same core modules introduced in #269.

Suggested reviewers

  • kaghni
  • efenocchi

Poem

🐇 Hoppity-hop through the backfill queue,
Failures now tagged with a reason true!
Windows .cmd gets its stdin fed,
--verbose shines light on what went dead.
With coverage raised and all tests green,
The fluffiest pipeline ever seen! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the two main changes: fixing claude binary resolution on PATH and surfacing per-session failure reasons in the memory backfill command.
Description check ✅ Passed The description comprehensively covers the problem statement, root causes, reproduction steps, test coverage, and validation metrics, though it lacks an explicit version bump note per the template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/backfill-surface-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 90.59% (🎯 90%) 356 / 393
🟢 Statements 91.90% (🎯 90%) 431 / 469
🟢 Functions 97.83% (🎯 90%) 45 / 46
🟢 Branches 90.30% (🎯 90%) 270 / 299
File Coverage — 3 files changed
File Stmts Branches Functions Lines
src/cli/index.ts 🔴 86.2% 🔴 85.5% 🟢 100.0% 🔴 83.9%
src/commands/backfill-memory.ts 🟢 99.3% 🟢 97.8% 🟢 94.7% 🟢 99.2%
src/skillify/stage-memory.ts 🟢 100.0% 🟢 94.7% 🟢 100.0% 🟢 100.0%

Generated for commit 1f0bf9d.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/claude-code/backfill-memory.test.ts (1)

297-299: ⚡ Quick win

Prefer exact output assertions over substring checks in these tests.

Several checks here rely on generic toContain(...) matches. Please assert exact expected lines/messages (or exact rendered arrays) so regressions in wording/structure are caught deterministically.

As per coding guidelines, "tests/**: Prefer asserting on specific values (paths, messages) over generic substrings."

Also applies to: 321-323, 328-329, 360-360

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/backfill-memory.test.ts` around lines 297 - 299, The test
assertions in the backfill-memory.test.ts file use generic toContain() substring
matching (at lines 297-299, 321-323, 328-329, and 360), which won't catch
regressions in output wording or structure. Replace these generic substring
checks with exact assertions that verify specific complete lines or the exact
rendered output format. For example, instead of checking if the joined output
contains a substring like "--verbose", assert that a specific expected line
appears in the output array, or verify the complete output matches an expected
format exactly.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/commands/backfill-memory.ts`:
- Around line 339-343: The try-finally block currently only wraps the runExtract
call, but the backfill lock is acquired before this block in the planBackfill
and renderPlan operations. If either of those operations throws an error, the
lock will not be released. Restructure the code by moving the beginning of the
try block to encompass the planBackfill and renderPlan calls as well, so that
the releaseBackfillLock call in the finally block will properly execute and
release the lock regardless of whether the failure occurs during planning,
rendering, or execution.

In `@tests/claude-code/stage-memory-shell-spawn.test.ts`:
- Line 90: Replace the generic substring assertion toContain("from stdin") with
an exact content assertion using toBe() in the readFileSync expectation on line
90. Since the file content is deterministic, assert the complete expected file
content rather than just checking if it contains a substring. Capture the full
expected markdown content that should be written to the claude_code-s1.md file
and compare it exactly.

---

Nitpick comments:
In `@tests/claude-code/backfill-memory.test.ts`:
- Around line 297-299: The test assertions in the backfill-memory.test.ts file
use generic toContain() substring matching (at lines 297-299, 321-323, 328-329,
and 360), which won't catch regressions in output wording or structure. Replace
these generic substring checks with exact assertions that verify specific
complete lines or the exact rendered output format. For example, instead of
checking if the joined output contains a substring like "--verbose", assert that
a specific expected line appears in the output array, or verify the complete
output matches an expected format exactly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: cee94838-18f8-4bf8-9b52-e1775d34406a

📥 Commits

Reviewing files that changed from the base of the PR and between 94b66dd and 641d15f.

📒 Files selected for processing (9)
  • src/cli/index.ts
  • src/commands/backfill-memory.ts
  • src/skillify/stage-memory.ts
  • tests/claude-code/backfill-memory.test.ts
  • tests/claude-code/stage-memory-embed.test.ts
  • tests/claude-code/stage-memory-shell-spawn.test.ts
  • tests/claude-code/stage-memory.test.ts
  • tests/cli/cli-index.test.ts
  • vitest.config.ts

Comment thread src/commands/backfill-memory.ts
Comment thread tests/claude-code/stage-memory-shell-spawn.test.ts Outdated
khustup2 added 2 commits June 18, 2026 16:37
…rings

CodeRabbit review on #277:
- Widen runBackfillMemory's try so the finally always releases the lock —
  a planBackfill/renderPlan throw must not strand a worker-owned lock.
  Safe: releaseBackfillLock no-ops unless this process owns the lock, so the
  dry-run early return under it is unaffected.
- Add docstrings to the exported backfill/stage-memory functions and
  interfaces to satisfy the docstring-coverage pre-merge check.
CodeRabbit review on #277: replace toContain with an exact toBe match on the
deterministic summary content.
@khustup2 khustup2 requested a review from efenocchi June 18, 2026 18:12
@khustup2 khustup2 merged commit 3dc256a into main Jun 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memory backfill stages 0/N (0 embedded, N failed) with no surfaced error on clean supported install

2 participants