fix(rollout): apply rollout sample filter in the rollout manager#2061
Open
EazyReal wants to merge 1 commit into
Open
fix(rollout): apply rollout sample filter in the rollout manager#2061EazyReal wants to merge 1 commit into
EazyReal wants to merge 1 commit into
Conversation
Custom rollout functions (--rollout-function-path) did not honor --rollout-sample-filter-path. The filter was invoked only inside the default sglang rollout (generate_rollout_async), so any user-supplied rollout function silently bypassed it and trained on samples that should have been dropped. Move the filter to the RolloutManager so it runs after the rollout function returns, regardless of which rollout function produced the samples. A new helper, apply_rollout_sample_filter(args, samples) in slime/rollout/base_types.py, loads and applies the configured filter in place; the manager calls it on the grouped output right after the rollout_id contract is validated and before the groups are flattened. The default sglang path no longer applies the filter itself, avoiding a double application. The rollout_all_samples_process_path hook stays in the sglang path because it intentionally needs to see all samples, including filtered ones. Verified by two counting-filter tests in tests/plugin_contracts/test_plugin_rollout_contracts.py that run the real RolloutManager._get_rollout_data with a filter that records every invocation: the custom rollout path test fails on the unfixed code (the filter never runs), and the built-in rollout path test fails if the filter runs twice (manager plus generate_rollout_async). This test file is already in the no-GPU CI matrix, so both run on every PR. When --rollout-sample-filter-path is unset, the helper returns immediately and behavior is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5579d17 to
2bdd8de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Custom rollout functions wired through
--rollout-function-pathdid not honor--rollout-sample-filter-path. The filter was invoked only inside the default sglang rollout (generate_rollout_asyncinslime/rollout/sglang_rollout.py), so any user-supplied rollout function silently bypassed it. The filter's job is to setSample.remove_sample = Trueon samples that should not participate in loss (per the--rollout-sample-filter-pathhelp text); downstream inRolloutManager(slime/ray/rollout.py, the loss-mask loop),remove_sample=Truezeroes that sample'sloss_mask. With a custom rollout, the filter never ran,remove_samplestayed at its defaultFalse, so samples that should have been dropped kept a non-zero loss mask and still contributed to the training loss.Before vs After
Same input: a custom rollout function returns 2 groups of 2 samples, and a filter marks the last sample of each group for removal.
Before — the filter is never reached, so nothing is marked; every sample keeps a non-zero loss mask:
After —
RolloutManagerapplies the filter to the custom rollout's output:Fix
Apply the filter in
RolloutManagerafter the rollout function returns, independent of which rollout function produced the samples.apply_rollout_sample_filter(args, samples)inslime/rollout/base_types.pyloads the configured filter and applies it in place; it returns immediately when--rollout-sample-filter-pathis unset.RolloutManager._generatecalls it on the grouped train output right after the rollout-id contract is validated (_validate_rollout_id_annotated) and before the groups are flattened.generate_rollout_async) no longer applies the filter itself, so there is no double application.--rollout-all-samples-process-pathstays in the sglang path on purpose: that hook is documented to see all samples, including filtered ones.Why this is the right fix
--rollout-sample-filter-pathisNone(the default),apply_rollout_sample_filterreturns immediately, so behavior is bit-identical for both the default and custom rollout paths.generate_rollout_asyncand moving it to the single manager-side call site means the default path's behavior is unchanged while the custom path now gets the same treatment, at exactly one place.--rollout-sample-filter-pathis documented to decide loss participation for rollout samples; enforcing it inRolloutManagermakes it apply to every rollout function, which is what the flag already promised.remove_sample-> zeroedloss_maskmechanism is untouched.tests/plugin_contracts/test_plugin_rollout_contracts.pyrun the realRolloutManager._get_rollout_dataon CPU (the contract-test stubs keep@ray.remoteclasses plain and stub the sglang constants the module imports) with a filter that records every invocation.test_manager_applies_sample_filter_to_custom_rollout_exactly_oncedrives a custom rollout function through the manager and asserts the filter ran exactly once and itsremove_samplemarks survive flattening — it fails on the unfixed code, where the filter never runs for custom rollouts.test_manager_applies_sample_filter_to_builtin_rollout_exactly_oncedrives the built-ingenerate_rollout(generation replaced by an immediate stub state) through the same manager path and asserts exactly one invocation — it fails if the filter is also applied insidegenerate_rollout_async, i.e. if the hoist ever regresses into double-filtering on the default path. This file is already in the no-GPU CI matrix, so both run on every PR with no workflow change.