NVIDIA Nemotron-Cascade 2: High-Density 30B MoE with Gold Medal Reasoning
These articles are AI-generated summaries. Please check the original sources for full details.
NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities
NVIDIA has launched Nemotron-Cascade 2, an open-weight 30B Mixture-of-Experts model with 3B activated parameters. It is the second open-weight LLM to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad and ICPC World Finals.
Why This Matters
Frontier-scale intelligence often requires massive parameter counts, leading to high inference costs and latency. Nemotron-Cascade 2 shifts the focus to ‘intelligence density,’ proving that domain-specific reinforcement learning and on-policy distillation can deliver state-of-the-art reasoning in math and coding at a fraction of the scale used by 100B+ parameter models.
Key Insights
- Superior mathematical reasoning: Nemotron-Cascade 2 scored 92.4 on AIME 2025, outperforming Qwen3.5-35B-A3B’s score of 91.9.
- Enhanced coding performance: The model achieved 439.28 on IOI 2025, significantly higher than Qwen3.5-35B-A3B’s 348.6.
- Multi-Domain On-Policy Distillation (MOPD): This technique reached AIME25 teacher-level performance in 30 steps, proving more sample-efficient than GRPO.
- Extended context training: NVIDIA utilized a curated dataset with sequences packed up to 256K tokens during the SFT phase, including 1.9M Python reasoning traces.
- Instruction following excellence: The model scored 83.5 on ArenaHard v2, surpassing the larger Nemotron-3-Super-120B-A12B in alignment benchmarks.
Practical Applications
- Competitive Programming and Math: Leverage Thinking Mode by initiating prompts with the
token for complex logic. Pitfall: Using direct responses for multi-step proofs may bypass the model’s specialized reasoning traces. - Agentic Tool Interaction: Implement structured tool-calling within <tool_call> tags for verifiable software engineering workflows. Pitfall: Failing to provide the tool list within
tags in the system prompt prevents the model from correctly formatting requests.
References:
Continue reading
Next article
OpenGitClaw: The Autonomous AI Agent for Full-Scale GitHub Repo Maintenance
Related Content
NVIDIA Nemotron 3 Super: 120B Parameter Hybrid MoE Model for Agentic AI
NVIDIA's Nemotron 3 Super is a 120B parameter hybrid Mamba-Attention MoE model delivering 5x higher throughput for complex agentic AI applications.
Qwen3.6-35B-A3B: Sparse MoE Vision-Language Model with 3B Active Parameters
Alibaba releases Qwen3.6-35B-A3B, a sparse MoE model with 3B active parameters that outperforms larger models on Terminal-Bench 2.0 and SWE-bench.
Alibaba Unveils Qwen3-Max-Thinking, a Trillion-Parameter Reasoning Model
Alibaba introduces Qwen3-Max-Thinking, a test-time scaled reasoning model with native tool use, achieving 92.8% accuracy on GPQA Diamond and 91.4% on LiveCodeBench v6.