Add a metastore read replica role for read-only routing#6548
Add a metastore read replica role for read-only routing#6548shuheiktgw wants to merge 16 commits into
Conversation
c0ae4dd to
d14a422
Compare
| # # DataFusion when enabled, to nodes running the `metastore_read_replica` | ||
| # # service. Searchers require at least one `metastore_read_replica` node at | ||
| # # startup and do not fall back to the primary metastore. | ||
| # use_metastore_read_replica: false |
There was a problem hiding this comment.
The reason we need this flag in addition to metastore_read_replica_uri is that users can set metastore_read_replica_uri via the QW_METASTORE_READ_REPLICA_URI environment variable. Without this flag, searchers would need access to that same environment variable even though they don’t need the secret itself.
Introduce the `metastore_read_replica` service role and the `metastore_read_replica_uri` node config option: the foundation for routing read-only metastore traffic to a PostgreSQL read replica. - Add `QuickwitService::MetastoreReadReplica`, parsed from `metastore_read_replica` / `metastore-read-replica`. - Split `QuickwitService::default_services()` from `supported_services()` so the new role is opt-in and never enabled implicitly on all-in-one nodes. - Add the optional `metastore_read_replica_uri` field (env `QW_METASTORE_READ_REPLICA_URI`), redacted alongside `metastore_uri`. - Validate that the role requires a PostgreSQL `metastore_read_replica_uri` and cannot be co-located with the `metastore` role.
Add the resolution plumbing for connecting to a PostgreSQL read replica
over a read-only connection.
- Add `MetastoreFactoryOptions { read_only }` and thread it through
`MetastoreFactory::resolve`.
- Add `MetastoreResolver::resolve_read_only`, which rejects any non-
PostgreSQL backend.
- Key the PostgreSQL factory cache on `(uri, options)` so the read-write
and read-only clients get distinct connection pools.
- Add `PostgresqlMetastore::new_read_only`: a read-only connection pool
with migrations skipped (the replica is migrated by the primary).
Resolve and expose a metastore gRPC server when the `metastore_read_replica` role is enabled, backed by a read-only connection to `metastore_read_replica_uri`. - The read replica server reuses the metrics + load-shed layers but omits the control-plane event layers, which only wrap write RPCs. - A read-replica node is exempted from the control-plane connectivity wait, like a primary metastore node, so dedicated replica pods start independently. - Extract `metastore_max_in_flight_requests` shared by both roles.
Add `ReadReplicaRoutingMetastore`, a `MetastoreService` wrapper that routes the stale-tolerant reads issued by the search and analytics paths (`index_metadata`, `indexes_metadata`, `list_indexes_metadata`, `list_splits`, `list_metrics_splits`, `list_sketch_splits`) to read replica nodes when any are connected, and everything else (writes and non-hot-path reads) to the primary. - Routing is decided per request from the read replica balance channel's live connection set, so it degrades to the primary when no replica is deployed. The check is a synchronous `watch` read with no borrow held across an await. - Wire it into the searcher service and the DataFusion session builder, so all search (REST, Elasticsearch, gRPC) and metrics analytics benefit, while REST admin handlers keep read-your-writes against the primary.
- Document `metastore_read_replica_uri` in the node config reference and the example `quickwit.yaml`. - Add a PostgreSQL-gated test that resolves a read-only metastore against a real database and verifies it serves read RPCs.
Replace the full `MetastoreService` implementation on `ReadReplicaRoutingMetastore` (which forced ~45 delegating methods, most of them writes) with a narrow, read-only `MetastoreReadService` trait — the read-only subset of the metastore RPCs (`index_metadata`, `list_indexes_metadata`, `list_splits`, `list_metrics_splits`, `list_sketch_splits`). - `MetastoreServiceClient` implements `MetastoreReadService`; `MetastoreReadServiceClient = Arc<dyn MetastoreReadService>`. - `ReadReplicaRoutingMetastore` now implements only the 5-method trait, so writes are excluded at the type level rather than delegated. - The search and DataFusion read paths take `MetastoreReadServiceClient` / `&dyn MetastoreReadService`; `list_parquet_splits_*` take `&dyn MetastoreReadService`. `single_node_search` keeps its concrete `MetastoreServiceClient` parameter and adapts internally. Addresses review feedback that the wrapper should expose a Go-style read-only interface instead of reimplementing the whole service.
1bf29fc to
09c32ad
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09c32ada6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| use crate::IndexerConfig; | ||
|
|
||
| #[test] | ||
| fn test_node_config_redact_metastore_uris() { |
| match &node_config.metastore_read_replica_uri { | ||
| Some(_) => Ok(()), | ||
| None => { | ||
| bail!( | ||
| "the `metastore_read_replica` service requires `metastore_read_replica_uri` to be \ | ||
| set" | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
| match &node_config.metastore_read_replica_uri { | |
| Some(_) => Ok(()), | |
| None => { | |
| bail!( | |
| "the `metastore_read_replica` service requires `metastore_read_replica_uri` to be \ | |
| set" | |
| ) | |
| } | |
| } | |
| if node_config.metastore_read_replica_uri.is_none() { | |
| bail!( | |
| "the `metastore_read_replica` service requires `metastore_read_replica_uri` to be set" | |
| ); | |
| } | |
| Ok(()) | |
| } |
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_metastore_read_replica_role_accepts_postgres_uri() { |
There was a problem hiding this comment.
I think this test is redundant.
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_metastore_read_replica_uri_can_be_set_from_env() { |
There was a problem hiding this comment.
I think this one is overkill. This is basically ConfigValue which is already tested.
|
|
||
| #[tokio::test] | ||
| async fn test_metastore_read_replica_role_must_run_standalone() { | ||
| for service in [ |
There was a problem hiding this comment.
I think the for-loop is overkill, especially since the branch that tests this is != 1
| QuickwitService::Searcher => "searcher", | ||
| QuickwitService::Janitor => "janitor", | ||
| QuickwitService::Metastore => "metastore", | ||
| QuickwitService::MetastoreReadReplica => "metastore_read_replica", |
There was a problem hiding this comment.
| QuickwitService::MetastoreReadReplica => "metastore_read_replica", | |
| QuickwitService::MetastoreReadReplica => "metastore-read-replica", |
control_plane was a mistake :) I'd rather follow the k8s convention. I agree this is super nit and arbitrary/debatable. I'm basically abusing my co-BDFL title.
There was a problem hiding this comment.
I'd stick to _. This is not worth the effort and debatable.
One interesting argument pro "_" : - is often considered something you tokenize on, while _ does not.
So for instance, if you double click on a word to select it, or do maj arrow thingy to select stuff, or do b in vim, it will stop at the - but not at the _.
snake case vs kebab case
| /// This is every supported service except [`QuickwitService::MetastoreReadReplica`], which is | ||
| /// opt-in: it requires a dedicated read replica URI and is meant to be deployed as its own set | ||
| /// of nodes, so it must never be enabled implicitly by an all-in-one node. | ||
| pub fn default_services() -> HashSet<QuickwitService> { |
| "searcher" => Ok(QuickwitService::Searcher), | ||
| "janitor" => Ok(QuickwitService::Janitor), | ||
| "metastore" => Ok(QuickwitService::Metastore), | ||
| "metastore-read-replica" | "metastore_read_replica" => { |
| /// should depend on this trait rather than on the full [`MetastoreService`], so that writes are | ||
| /// excluded at the type level. Add a method here only for RPCs that tolerate replication lag. | ||
| #[async_trait] | ||
| pub trait MetastoreReadService: fmt::Debug + Send + Sync + 'static { |
There was a problem hiding this comment.
Even if we have the MetastoreReadService trait, I'd like us to properly catch the Posgres error "read only" (code 25006) and return it properly instead of a generic MetastoreError::Db error.
Description
This PR adds support for deploying metastore read replicas as a dedicated
metastore_read_replicaservice and lets searchers opt into using them throughsearcher.use_metastore_read_replica.How it works
metastore_read_replicanodes withmetastore_read_replica_uripointing to a PostgreSQL read replica. Ifsearcher.use_metastore_read_replicais true, searchers usemetastore_read_replicainstead of the primary metastore.MetastoreReadServiceClient, which exposes only the read-only subset of metastore RPCs to prevent write methods from being used.Limitations / Future Improvements
metastore_read_replicaneeds to run as a standalone service to keep the implementation simple. It shares the same gRPC service asmetastore, so it cannot run alongside it, and running it separately also simplifies the metadata service connection handling.