Skip to content

fix(juicefs): correct stale PV fuse label key after Dataset recreation#6090

Closed
jakharmonika364 wants to merge 1 commit into
fluid-cloudnative:masterfrom
jakharmonika364:fix/issue-6089-fuse-label-key-consistency
Closed

fix(juicefs): correct stale PV fuse label key after Dataset recreation#6090
jakharmonika364 wants to merge 1 commit into
fluid-cloudnative:masterfrom
jakharmonika364:fix/issue-6089-fuse-label-key-consistency

Conversation

@jakharmonika364

Copy link
Copy Markdown
Contributor

Ⅰ. Describe what this PR does

When namespace+name exceeds ~52 characters, GetFuseLabelName falls back to a UID-based label key (fluid.io/f-<ownerDatasetUID>). If the Dataset gets deleted and recreated, the ownerDatasetUID changes. SyncRuntime picks up the new UID and updates the fuse daemonset's nodeSelector, but CreatePersistentVolumeForRuntime only creates PVs and never updates them, so the PV's mount_pod_node_selector_key keeps pointing to the old key. The CSI node plugin reads this stale key from the PV to label nodes, the fuse daemonset nodeSelector no longer matches, fuse pods don't get scheduled, and mounts fail.

This PR adds syncPVFuseLabelKey() which is called unconditionally in syncFuseSpec() on every reconcile. It reads the PV's current mount_pod_node_selector_key and patches it if it doesn't match runtimeInfo.GetFuseLabelName(). Running it unconditionally covers the case where the daemonset was already updated in a prior reconcile but the PV patch hadn't happened yet.

Ⅱ. Does this pull request fix one issue?

fixes #6089

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Added TestSyncPVFuseLabelKey in pkg/ddc/juicefs/fuse_label_consistency_test.go:

  • Creates a PV with a stale label key (based on ownerDatasetUID_A)
  • Initializes the engine with a new ownerDatasetUID_B (simulating Dataset recreation)
  • Calls syncPVFuseLabelKey() and asserts the PV's mount_pod_node_selector_key is updated to the UID_B based key

Ⅳ. Describe how to verify it

Run the unit test:

go test ./pkg/ddc/juicefs/ -run TestSyncPVFuseLabelKey -v

Expected output:

image

Ⅴ. Special notes for reviews

The fix runs a PV GET on every syncFuseSpec call. If the key already matches (common case), no update is made. The slight extra cost per reconcile is acceptable given mounts would otherwise fail silently.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for issue #6089 by adding syncPVFuseLabelKey to keep the PV's mount_pod_node_selector_key CSI attribute in sync with the current fuse label key, preventing mount failures when the owner dataset UID changes. It also includes a unit test to verify this behavior. The review feedback points out a critical issue where assigning a value to pv.Spec.CSI.VolumeAttributes could cause a runtime panic if the map is nil, and provides a code suggestion to safely initialize it.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/ddc/juicefs/sync_runtime.go
…ative#6089)

When namespace+name exceeds ~52 chars, GetFuseLabelName falls back to a
UID-based key (fluid.io/f-<ownerDatasetUID>). If the Dataset is deleted
and recreated the ownerDatasetUID changes, so SyncRuntime updates the
fuse daemonset's nodeSelector to the new key while the PV's
mount_pod_node_selector_key retains the old key. The CSI node plugin
reads the stale key from the PV to label nodes, but the fuse daemonset
has a different nodeSelector, so fuse pods never get scheduled and
mounts fail.

Fix: add syncPVFuseLabelKey() called unconditionally in syncFuseSpec()
on every reconcile. It compares the PV's mount_pod_node_selector_key
against runtimeInfo.GetFuseLabelName() and patches the PV if stale.
Running it unconditionally ensures the PV is corrected even when no
other fuse spec change is detected in the same reconcile cycle.

Fixes: fluid-cloudnative#6089

Signed-off-by: Monika Jakhar <jakharmonika364@gmail.com>
@jakharmonika364 jakharmonika364 force-pushed the fix/issue-6089-fuse-label-key-consistency branch from b13e76f to e26857d Compare July 1, 2026 05:02
@fluid-e2e-bot

fluid-e2e-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zwwhdls for approval by writing /assign @zwwhdls in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 48.48485% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.76%. Comparing base (ccd52dc) to head (e26857d).

Files with missing lines Patch % Lines
pkg/ddc/juicefs/sync_runtime.go 48.48% 12 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6090      +/-   ##
==========================================
- Coverage   64.77%   64.76%   -0.02%     
==========================================
  Files         484      484              
  Lines       33892    33925      +33     
==========================================
+ Hits        21954    21970      +16     
- Misses      10215    10227      +12     
- Partials     1723     1728       +5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TrafalgarZZZ

Copy link
Copy Markdown
Member

@jakharmonika364 Thanks for your PR that tries to fix the issue! But I think the changes won't fix the issue because PV is immutable after creation. So syncing PV fuse label key is not possible. We had some internal discussion with JuiceFSRuntime's maintainer @zwwhdls , and we decide to fix that issue by disable syncing fuse nodeSelector (I've opened a PR #6092 ). So I'll close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Inconsistent fuse node selector cause mount failure

2 participants