Skip to content

fix(rollout): apply rollout sample filter in the rollout manager#2061

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/manager-side-sample-filter
Open

fix(rollout): apply rollout sample filter in the rollout manager#2061
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/manager-side-sample-filter

Conversation

@EazyReal

@EazyReal EazyReal commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

Custom rollout functions wired through --rollout-function-path did not honor --rollout-sample-filter-path. The filter was invoked only inside the default sglang rollout (generate_rollout_async in slime/rollout/sglang_rollout.py), so any user-supplied rollout function silently bypassed it. The filter's job is to set Sample.remove_sample = True on samples that should not participate in loss (per the --rollout-sample-filter-path help text); downstream in RolloutManager (slime/ray/rollout.py, the loss-mask loop), remove_sample=True zeroes that sample's loss_mask. With a custom rollout, the filter never ran, remove_sample stayed at its default False, so samples that should have been dropped kept a non-zero loss mask and still contributed to the training loss.

Before vs After

Same input: a custom rollout function returns 2 groups of 2 samples, and a filter marks the last sample of each group for removal.

def drop_last_sample_filter(args, groups):  # --rollout-sample-filter-path
    for group in groups:
        group[-1].remove_sample = True

args.rollout_function_path = "my_module.custom_rollout"   # not the default sglang path
args.rollout_sample_filter_path = "my_module.drop_last_sample_filter"

Before — the filter is never reached, so nothing is marked; every sample keeps a non-zero loss mask:

remove_sample per group: [[False, False], [False, False]]   # filter silently ignored

After — RolloutManager applies the filter to the custom rollout's output:

remove_sample per group: [[False, True], [False, True]]     # last sample of each group dropped (loss_mask zeroed)

Fix

Apply the filter in RolloutManager after the rollout function returns, independent of which rollout function produced the samples.

  • New helper apply_rollout_sample_filter(args, samples) in slime/rollout/base_types.py loads the configured filter and applies it in place; it returns immediately when --rollout-sample-filter-path is unset.
  • RolloutManager._generate calls it on the grouped train output right after the rollout-id contract is validated (_validate_rollout_id_annotated) and before the groups are flattened.
  • The default sglang path (generate_rollout_async) no longer applies the filter itself, so there is no double application.
  • --rollout-all-samples-process-path stays in the sglang path on purpose: that hook is documented to see all samples, including filtered ones.

Why this is the right fix

  • Default-path safety / no-op when unset. When --rollout-sample-filter-path is None (the default), apply_rollout_sample_filter returns immediately, so behavior is bit-identical for both the default and custom rollout paths.
  • No double application. Removing the filter call from generate_rollout_async and moving it to the single manager-side call site means the default path's behavior is unchanged while the custom path now gets the same treatment, at exactly one place.
  • Matches the documented contract. --rollout-sample-filter-path is documented to decide loss participation for rollout samples; enforcing it in RolloutManager makes it apply to every rollout function, which is what the flag already promised.
  • Minimal, no new abstraction. One small helper plus a one-line call; the remove_sample -> zeroed loss_mask mechanism is untouched.
  • CI-verifiable, no-GPU. Two counting-filter tests in tests/plugin_contracts/test_plugin_rollout_contracts.py run the real RolloutManager._get_rollout_data on CPU (the contract-test stubs keep @ray.remote classes plain and stub the sglang constants the module imports) with a filter that records every invocation. test_manager_applies_sample_filter_to_custom_rollout_exactly_once drives a custom rollout function through the manager and asserts the filter ran exactly once and its remove_sample marks survive flattening — it fails on the unfixed code, where the filter never runs for custom rollouts. test_manager_applies_sample_filter_to_builtin_rollout_exactly_once drives the built-in generate_rollout (generation replaced by an immediate stub state) through the same manager path and asserts exactly one invocation — it fails if the filter is also applied inside generate_rollout_async, i.e. if the hoist ever regresses into double-filtering on the default path. This file is already in the no-GPU CI matrix, so both run on every PR with no workflow change.

Custom rollout functions (--rollout-function-path) did not honor
--rollout-sample-filter-path. The filter was invoked only inside the
default sglang rollout (generate_rollout_async), so any user-supplied
rollout function silently bypassed it and trained on samples that should
have been dropped.

Move the filter to the RolloutManager so it runs after the rollout
function returns, regardless of which rollout function produced the
samples. A new helper, apply_rollout_sample_filter(args, samples) in
slime/rollout/base_types.py, loads and applies the configured filter in
place; the manager calls it on the grouped output right after the
rollout_id contract is validated and before the groups are flattened.
The default sglang path no longer applies the filter itself, avoiding a
double application. The rollout_all_samples_process_path hook stays in
the sglang path because it intentionally needs to see all samples,
including filtered ones.

Verified by two counting-filter tests in
tests/plugin_contracts/test_plugin_rollout_contracts.py that run the
real RolloutManager._get_rollout_data with a filter that records every
invocation: the custom rollout path test fails on the unfixed code
(the filter never runs), and the built-in rollout path test fails if
the filter runs twice (manager plus generate_rollout_async). This test
file is already in the no-GPU CI matrix, so both run on every PR.

When --rollout-sample-filter-path is unset, the helper returns
immediately and behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@EazyReal EazyReal force-pushed the upstream-pr/manager-side-sample-filter branch from 5579d17 to 2bdd8de Compare June 12, 2026 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant