Skip to content

[improve][pip] PIP-486: Scalable Topic Key-Shared Consumption#26077

Open
merlimat wants to merge 1 commit into
apache:masterfrom
merlimat:mmerli/pip-486
Open

[improve][pip] PIP-486: Scalable Topic Key-Shared Consumption#26077
merlimat wants to merge 1 commit into
apache:masterfrom
merlimat:mmerli/pip-486

Conversation

@merlimat

Copy link
Copy Markdown
Contributor

Motivation

PIP-468 and
PIP-483 give scalable topics a DAG of
range segments whose steady-state ordered-consumption model is one consumer per segment. Several
situations need key-shared (per-message-key) ordered consumption instead: draining a sealed-segment
backlog after a scale-up, consolidating many low-throughput topics, and getting consumer parallelism
beyond the max-segments ceiling.

Today, key-shared dispatch couples the producer's batching mode to the consumer's subscription mode
(the producer must disable batching or use the key-based batcher). This PIP removes that coupling for
scalable topics.

Modifications

Adds the design document pip/pip-486.md. In summary:

  • Entry-bucketing: a second, independent 16-bit hash (hashB, the low half of the same 32-bit key
    hash whose high half drives segment routing) divides each segment into a configurable number of
    buckets; the producer keeps each batch within one bucket and stamps the bucket's hashB range in the
    outer MessageMetadata.
  • Routing by range: the broker dispatches a whole entry to the consumer that owns that bucket — no
    per-key hashing, no decompression, one entry to exactly one consumer.
  • Per-topic bucket budget, divided across segments (a split halves a segment's buckets); bucket
    count is immutable per segment, changed only by a controller-driven "rebucket rollover" (a no-op
    split reusing the PIP-468 seal/redirect flow).
  • Controller-driven bucket→consumer assignment; reassignment reuses the existing Key_Shared
    blocked-hash handling, tracking pending state per bucket. No shared-entry ack machinery.

This is a documentation-only PR (the design proposal). Implementation will follow in separate PRs.
A discussion thread will be started on dev@pulsar.apache.org.

Adds the PIP-486 design document: key-shared (per-message-key) ordered
consumption on scalable topics via producer-side entry-bucketing, decoupling
the producer's batching mode from the consumer's subscription mode.

Discussion to follow on dev@pulsar.apache.org.
@github-actions github-actions Bot added the PIP label Jun 22, 2026
Comment thread pip/pip-486.md
Comment on lines +99 to +101
2. **Intra-segment bucketing (this PIP).** A **separate, independent** hash `hashB(key)` maps keys onto
a second ring, and each segment divides *that* ring into `N` equal **buckets** — a bucket is a
contiguous `hashB` sub-range `[start, end)`. `hashB` must be independent of the segment-routing hash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than fixing the solution to equal-sized hash buckets, it would be useful to support splitting a hot shard on arbitrary boundaries based on key statistics. Message keys are often derived from business entities (tenant, account, or device IDs) whose traffic is heavily skewed—a handful of keys can dominate throughput while most contribute little. Equal-sized buckets can't isolate such hot keys, so adapting split boundaries to the observed distribution would balance load far more effectively. This wouldn't have to be implemented from the start, but it would be great if the design could be kept open for extension in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants