fix(store): preserve corrupt DB instead of deleting (#557)#620
Conversation
The startup/query integrity check unlink()'d a project's .db (plus its -wal/-shm sidecars) on any anomaly, causing total data loss with no backup, no recovery path, and no visible warning. The trigger is often a false positive: a WAL leftover after a SIGKILL, a binary variant swap, or a stale-lock break can all make a recoverable DB look corrupt. Replace the destructive unlink with a rename to <name>.db.corrupt.<ts> and emit a prominent stderr message so the user can inspect, recover, or attach the file to a bug report. Two call sites are fixed: - mcp.c resolve_store(): the reported path, hit on first query. - pipeline.c try_incremental_or_delete_db(): only preserves on real corruption; a healthy DB rebuilt due to a mode change is still deleted so .corrupt files don't accumulate on normal reindexes. Add a regression test (resolve_store_preserves_corrupt_db_issue557) that seeds a corrupt projects row, issues a query, and asserts the original .db is gone while a .corrupt sidecar remains. Signed-off-by: yasku <aguss.cba@gmail.com>
|
Working on fixing this |
CodeQL flagged a time-of-check/time-of-use race in try_incremental_or_delete_db(): the file at db_path was probed with stat() and then operated on by name (rename/unlink), so the underlying file could change between the two. Drop the redundant stat() existence probe and derive existence from a query-mode open (cbm_store_open_path_query, which never creates the file): a NULL result means absent/unreadable, so we return without touching the path. This removes the stat()->rename() dataflow CodeQL reports and matches resolve_store() in mcp.c, which already uses this pattern. Behavior is unchanged (absent -> skip, healthy -> reindex, corrupt -> preserve as .corrupt). Signed-off-by: yasku <aguss.cba@gmail.com>
Resolves conflict in src/pipeline/pipeline.c between #557 (preserve corrupt DB via rename) and #516 (preserve ADRs across reindex). ADR capture now runs before the delete/rename split so it covers both the healthy mode-change reindex and the corrupt-DB path. Opened read-only (cbm_store_open_path_query) so a WAL leftover is not checkpointed into the .db before it is preserved as evidence. Signed-off-by: yasku <aguss.cba@gmail.com>
|
Merged the latest Conflict resolution The two changes are complementary, not competing:
I moved the ADR capture so it runs before the delete/rename split, which means it now covers both the healthy mode-change reindex and the corrupt-DB path — matching the prior unconditional behavior on Verification
No functional change to the #557 fix itself; this update only integrates the latest |
|
Closing this PR — I'm continuing this work in a private repository, outside the upstream contribution flow. Thanks to the maintainers for the review and the CI feedback on the corrupt-DB preservation change. Issue #557 remains valid and is open for anyone who wants to pick it up independently. |
What does this PR do?
Fixes #557. The startup/query integrity check
unlink()'d a project's.db(plus its-wal/-shmsidecars) on any anomaly, causing total data loss with no backup, no recovery path, and no visible warning. The trigger is frequently a false positive: a WAL leftover after aSIGKILL, a binary variant swap (standard↔ui), or a stale-lock break can all make a recoverable DB look corrupt.This replaces the destructive
unlinkwith arenameto<name>.db.corrupt.<timestamp>and emits a prominentstderrmessage, matching the issue's suggested fix. The user can now inspect, recover, or attach the file to a bug report.Two call sites are fixed:
src/mcp/mcp.cresolve_store()— the reported path, hit on the first query against a project.src/pipeline/pipeline.ctry_incremental_or_delete_db()— only preserves on real corruption; a healthy DB rebuilt due to a mode change is still deleted, so.corruptfiles don't accumulate on normal reindexes.If
renamefails (permissions, cross-device), it falls back tounlinkso no orphan files are left behind.Checklist
git commit -s) — DCO verified viascripts/check-dco.shmake -f Makefile.cbm test) — 5684 passedresolve_store_preserves_corrupt_db_issue557seeds a corruptprojectsrow, issues a query, and asserts the original.dbis gone while a.corruptsidecar remains.Notes on the lint checklist item (transparency)
A few constraints worth making explicit so the maintainer isn't surprised:
make lint-cidoes not pass cleanly locally, but only on pre-existing code I did not touch. My local cppcheck is 2.21.0, which is newer/stricter than CI's and flags pre-existing findings insrc/cypher/cypher.candsrc/cli/cli.c(e.g.knownConditionTrueFalse,compareValueOutOfTypeRangeError). Running cppcheck scoped to just my three changed files exits clean (0). I deliberately did not touch those files to keep this PR focused (per CONTRIBUTING's "avoid unrelated reformatting").clang-format-20; I only haveclang-format-22locally. Both modified source files (mcp.c,pipeline.c) are whole-file clean under v22, and since the surrounding pre-existing code shows no diff under v22 either, v20 and v22 appear to agree on this file's style. If CI's v20 disagrees on any of my lines, I'm happy to reformat.strncmp(..., 19)/strncmp(..., 10)literals, matching the existing style intests/(e.g.test_cli.c,test_httpd.c,test_pipeline.c). This is fine because test sources are not inLINT_SRCS, and CI's lint step runs cppcheck + clang-format only (clang-tidy is enforced locally, not in CI).bug,stability/performance) to match cbm v0.8.1 silently deletes project DBs on "corrupt" detection — data loss with no recovery #557. Please add whatever is appropriate.