fix log backup state query failure handling#6950
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@RidRisR: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What changed
This updates log backup tracker state queries to distinguish critical PD etcd query failures from confirmed missing metadata:
infokey query failures as unknown state instead of task-not-found.infoquery failures and PD etcd client creation failures.BackupFailed / LogBackupStateQueryFailedonly once per continuous failure window, and retry reporting if the status update itself fails.pausekey query failures as partial state: keep usableinfo/checkpoint data, skip kernel state sync, and avoid marking the backup failed.Why
A transient PD/etcd/DNS issue could previously leave
InfoExists=falseand be interpreted asLogBackupTaskNotFound, incorrectly failing log backup even though task existence was unknown.Validation
GOCACHE=/tmp/go-cache go test ./pkg/backup/backup -count=1GOCACHE=/tmp/go-cache go test -race ./pkg/backup/backup -count=1