raft: expose controller leader_for gauge on public metrics#30826
Open
travisdowns wants to merge 2 commits into
Open
raft: expose controller leader_for gauge on public metrics#30826travisdowns wants to merge 2 commits into
travisdowns wants to merge 2 commits into
Conversation
The raft leader_for 0/1 leadership gauge was only registered on the internal metrics endpoint. Expose it on the public endpoint as redpanda_raft_leader_for, restricted to the controller group so external consumers can identify the controller leader without adding a per-partition public series for every raft group.
Extend cluster_metrics_reported_only_by_leader_test to check the raft leader_for gauge for the controller group: it must read 1 on the controller leader and 0 on every other running node, on both the internal (vectorized_raft_leader_for) and public (redpanda_raft_leader_for) endpoints, across the restart, failover, and no-quorum transitions the test already drives.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes the controller leader externally observable via /public_metrics by exposing the existing Raft leader_for gauge on the public metrics handle (as redpanda_raft_leader_for), limited to the controller Raft group to avoid high-cardinality public series.
Changes:
- Register a public
raft.leader_forgauge for the controller NTP only, usingmetrics::public_metric_groupsandredpanda_-prefixed label names. - Extend the ducktape cluster metrics test to assert the controller
leader_forgauge is1on the controller leader and0on all other running nodes, for both internal and public endpoints.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/rptest/tests/cluster_metrics_test.py | Adds assertions that the controller raft_leader_for gauge is correct on both /metrics and /public_metrics through restart/failover/no-quorum transitions. |
| src/v/raft/consensus.h | Adds a metrics::public_metric_groups member to allow consensus-owned public metric registration. |
| src/v/raft/consensus.cc | Registers leader_for on the public metrics handle for the controller group only, with public label names. |
Collaborator
CI test resultstest results on build#85870
|
bharathv
approved these changes
Jun 16, 2026
| } | ||
|
|
||
| // Public metrics carry redpanda_-prefixed label names, unlike the internal | ||
| // (bare) labels used by setup_metrics. |
| [this] { return is_elected_leader(); }, | ||
| sm::description("Indicates if this node is the controller leader"), | ||
| labels) | ||
| .aggregate({sm::shard_label})}); |
Member
There was a problem hiding this comment.
Remove, doesn't aggregate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The raft
leader_for0/1 leadership gauge is only registered on the internalmetrics endpoint (
vectorized_raft_leader_for). There is no public-metricssignal for which node is the controller leader, so external consumers
(dashboards, alerts) that only scrape
/public_metricscannot identify thecontroller leader.
This exposes the gauge on the public endpoint as
redpanda_raft_leader_for,restricted to the controller group so we don't add a per-partition public
series for every raft group. The registration lives in
consensus::setup_public_metrics()next to where the internalleader_forgauge is set up, using a
consensus-owned public metrics handle and theredpanda_-prefixed public label names. A value of 1 on a node means that nodeis the controller leader; 0 otherwise.
The ducktape test
cluster_metrics_reported_only_by_leader_testis extended toassert the controller
leader_forgauge reads 1 on the leader and 0 on everyother running node, on both the internal and public endpoints, across the
restart, failover, and no-quorum transitions the test already drives.
Backports Required
Release Notes
Improvements
redpanda_raft_leader_forgauge (1 on the controller leader, 0 otherwise).