Teaching LLMs to Count: IBM's PD-SSM Breakthrough
These articles are AI-generated summaries. Please check the original sources for full details.
The quest to teach LLMs how to count
IBM researchers presented a breakthrough at NeurIPS 2025, introducing PD-SSM, a state-space model that achieves 98.5% accuracy on sequential reasoning tasks. This addresses a critical limitation in transformers’ ability to track state over long sequences.
Why This Matters
Transformers excel at parallel processing but struggle with state tracking, a sequential task essential for logical reasoning. This flaw manifests in simple tasks like counting “r”s in “strawberry” or evaluating parity (even/odd counts of 1s in binary strings). While workarounds like chain-of-thought prompting exist, they increase computational cost. IBM’s PD-SSM directly tackles this by enhancing state tracking in hybrid transformer-SSM models, enabling progress on complex tasks like code generation and time-series forecasting.
Key Insights
- “PD-SSM achieves 98.5% accuracy on state tracking tasks, outperforming other SSM variants by 15 percentage points (IBM, 2025)”
- “State tracking is critical for logical reasoning, as transformers struggle with parity problems (Hahn, 2020)”
- “IBM’s Granite models integrate PD-SSM to improve efficiency in code generation and long-sequence analysis”
Practical Applications
- Use Case: IBM’s Granite models use PD-SSM for code generation and ethanol demand forecasting.
- Pitfall: Relying on diagonal matrices in SSMs leads to poor state tracking, resulting in failure on parity checks.
References:
Continue reading
Next article
IBM’s Software Engineering Agent Tops Leaderboard for Java
Related Content
Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic Models
NVIDIA’s Nemotron 3 Nano 30B A3B model achieves up to 3.3x higher throughput than leading models while maintaining best-in-class reasoning accuracy.
Salesforce AI Research Introduces xRouter: A Reinforcement Learning Router for Cost Aware LLM Orchestration
Salesforce’s xRouter achieves near GPT-5 accuracy on Olympiad Bench while reducing GPT-5 evaluation cost by 87.5%.
FACTS Benchmark Suite: A New Evaluation for LLM Factuality
The FACTS Benchmark Suite provides a systematic evaluation of LLM factuality across reasoning types, revealing all evaluated models achieved under 70% accuracy.