Identify pivotal prefixes
DISC samples continuations from the current prefix and scores them with an outcome reward model to locate promising solution regions.
Dynamic Decomposition Improves LLM Inference Scaling
DISC adaptively partitions reasoning traces during inference so language models spend more compute on the most difficult steps. By subdividing challenging segments and prioritising their sampling, DISC delivers better solutions with fewer tokens across coding and math benchmarks.
Automatically adjusts step sizes at inference time, zooming in on pivotal reasoning pivots without handcrafted heuristics.
Integrates with greedy, beam, or Monte Carlo tree search, controlling which prefixes receive more sampling budget.
Allocates tokens to hard steps, improving sample and token efficiency across both proprietary and open-source LLMs.
DISC is a recursive inference procedure. It repeatedly proposes candidate prefixes, compares their rewards, and dynamically decides whether to advance or contract the step size. The result is a search process that focuses on the most uncertain parts of the trajectory while skipping past the easy ones.
A rectangle has a perimeter of 24 inches. What is the maximum possible area of the rectangle?
The algorithm expands, contracts, and replays prefixes until it stabilizes on a proof-quality answer.
DISC samples continuations from the current prefix and scores them with an outcome reward model to locate promising solution regions.
Difficult prefixes trigger finer-grained decomposition, while easier prefixes advance in larger chunks to conserve budget.
Sampling focuses on high-importance tokens, driving additional rollouts only when they improve the z-score over existing prefixes.
The same decomposition policy controls node expansion for greedy, beam, or MCTS search, making DISC a drop-in upgrade.
Across APPS, MATH500, and LiveCodeBench, DISC lowers pass@10 error by 5.0%, 6.7%, and 10.5% relative to the best static baseline. The gains compound on the hardest competition problems where sampling budget is scarce.
DISC repeatedly splits critical prefixes such as connective words and control tokens. These pivots steer the downstream reasoning, so the algorithm allocates more sampling budget there instead of overspending on settled regions.
From lightweight LLaMA and Mistral checkpoints to proprietary reasoning models such as R1, DISC consistently boosts accuracy. For LLaMA the pass@10 rate jumps from 0.01 to 0.04—a threefold relative lift.
The same decomposition operator drives greedy search, beam search, or MCTS. DISC expands nodes until rewards plateau, then contracts, providing precise control over inference-time compute without new hyperparameters.
Under the same token budget, DISC recovers higher accuracy than sentence-level or token-level decomposition. When the sample budget is fixed, DISC returns better solutions with fewer tokens.
DISC reallocates inference budget toward the most uncertain prefixes, consistently outperforming static token, sentence, and single-step decomposition strategies across coding and mathematics benchmarks.
pass@10 error ↓
Down from 0.50 to 0.475 with a fixed 1.5k token budget.
5.0% relative reduction versus the strongest static decomposition.
pass@10 error ↓
Drops from 0.15 to 0.14 using a verifier-trained outcome reward model.
6.7% relative reduction with fewer sampled continuations.
pass@10 error ↓
Falls from 0.57 to 0.51 despite strict sandboxed unit tests.
10.5% relative reduction on freshly collected problems.
pass@10 accuracy ↑
Accuracy lifts by over 85% with only ten sampled reasoning traces.
Maintains a 33% relative improvement when matched to the base model token budget.
pass@10 accuracy ↑
pass@10 rate jumps from 0.01 to 0.04 on APPS coding problems.
4x gain (300% relative increase) within the same low sampling budget.
pass@10 accuracy ↑
Scores climb from 0.095 to 0.17 across budget-constrained runs.
79% relative increase while preserving resource efficiency.