fix log backup state query failure handling by RidRisR · Pull Request #6950 · pingcap/tidb-operator

RidRisR · 2026-06-15T09:50:33Z

What changed

This updates log backup tracker state queries to distinguish critical PD etcd query failures from confirmed missing metadata:

Treat info key query failures as unknown state instead of task-not-found.
Start a 10-minute critical query failure countdown for info query failures and PD etcd client creation failures.
Report BackupFailed / LogBackupStateQueryFailed only once per continuous failure window, and retry reporting if the status update itself fails.
Treat pause key query failures as partial state: keep usable info/checkpoint data, skip kernel state sync, and avoid marking the backup failed.
Preserve checkpoint updates when pause state is unknown.

Why

A transient PD/etcd/DNS issue could previously leave InfoExists=false and be interpreted as LogBackupTaskNotFound, incorrectly failing log backup even though task existence was unknown.

Validation

GOCACHE=/tmp/go-cache go test ./pkg/backup/backup -count=1
GOCACHE=/tmp/go-cache go test -race ./pkg/backup/backup -count=1

ti-chi-bot · 2026-06-15T09:50:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ti-chi-bot · 2026-06-15T09:50:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sdojjy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-06-15T10:34:44Z

@RidRisR: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-e2e-kind-scale-simultaneously	`02407b6`	link	false	`/test pull-e2e-kind-scale-simultaneously`
pull-e2e-kind-tngm	`02407b6`	link	false	`/test pull-e2e-kind-tngm`
pull-e2e-kind-dmcluster	`02407b6`	link	false	`/test pull-e2e-kind-dmcluster`
pull-e2e-kind-basic	`02407b6`	link	false	`/test pull-e2e-kind-basic`
pull-e2e-kind-tidbcluster	`02407b6`	link	false	`/test pull-e2e-kind-tidbcluster`
pull-e2e-kind-br	`02407b6`	link	false	`/test pull-e2e-kind-br`
pull-e2e-kind-across-kubernetes	`02407b6`	link	false	`/test pull-e2e-kind-across-kubernetes`
pull-e2e-kind-serial	`02407b6`	link	false	`/test pull-e2e-kind-serial`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fix log backup state query failure handling

3859143

ti-chi-bot Bot added the do-not-merge/work-in-progress label Jun 15, 2026

ti-chi-bot Bot requested a review from howardlau1999 June 15, 2026 09:50

ti-chi-bot Bot added the size/XXL label Jun 15, 2026

RidRisR marked this pull request as ready for review June 15, 2026 10:01

ti-chi-bot Bot removed the do-not-merge/work-in-progress label Jun 15, 2026

test: consolidate log backup query failure coverage

02407b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix log backup state query failure handling#6950

fix log backup state query failure handling#6950
RidRisR wants to merge 2 commits into
pingcap:release-1.xfrom
RidRisR:codex/log-backup-query-failure

RidRisR commented Jun 15, 2026

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RidRisR commented Jun 15, 2026

What changed

Why

Validation

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

ti-chi-bot Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant