feat: Tree-Search BFS, Distributed GRPO & TRACE Masking#451
Open
RUFFY-369 wants to merge 4 commits into
Open
Conversation
… TRACE masking, and OpenRLHF Ray wrapper
…try into evaluation engine
…coring boundaries
Author
|
📎 Housekeeping Note for Maintainers: Once this PR is evaluated, I’d be happy to submit a quick subsequent |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR upgrades the OpenGauss agent framework by transitioning the core execution engine from a sequential ReAct loop to a concurrent Tree-Search Breadth-First Search (BFS) architecture. It introduces high-speed containerized branching, Best-of-N trajectory scoring, a standalone Group Relative Policy Optimization (GRPO) mathematical engine, and deterministic TRACE reward-path masking.
Related Issue
Fixes #450
Type of Change
Changes Made
environments/agent_loop.py: Replaced the linear generation loop with a concurrent BFS generation engine capable of spawningGparallel branches per turn.tools/environments/docker.py: Implemented a sub-second Unix tar-pipe sandbox cloning mechanism for low-overhead, localized branch container provisioning.environments/gauss_base_env.py: Refactored the base evaluation layer to dynamically resolve physical branch context bindings (res.task_id) and execute concurrent Best-of-N trajectory selection.tools/rl_training_tool.py: Created theGaussGRPOEngineto natively calculate group-relative advantages (agent/trace_masking.py: Introduced a linear-time trajectory sanitizer that isolates user/assistant tool interactions from environment stdout noise prior to backpropagation.tests/test_trace_masking.py: Delivered an isolated, deterministic test suite validating the TRACE reward assignment heuristic across varied telemetry topologies.How to Test
Run the newly introduced unit tests to verify linear-time trajectory sanitization:
Initiate a multi-branch pilot execution (e.g., G=2) over the baseline cohort:
In a separate terminal shell, verify the creation, parallel execution, and automatic garbage-collection of guest containers:
docker ps --filter "name=-branch-"Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A📊 Comparative Performance Report
We executed a rigorous head-to-head pilot benchmark isolating OpenGauss 2.0 performance against the legacy baseline:
📋 TL;DR (Bottom Line Up Front)
🔬 Methodology & Scope Selection
To maximize statistical signal while protecting production API limits, we evaluated both branches across an identical 5-task representative cohort selected from the
TBLitebenchmark, spanning distinct operational domains (System Admin, Scientific Computing, ML, Software Eng, Debugging).📈 High-Level Telemetry Comparison
🔬 Task-by-Task Turn Allocation
system-administrationscientific-computinggeneral-mlsoftware-engineeringdebuggingcc @gauss-math-inc @jesse-michael-han