Transformers

2 articles in this category

AI NewsTransformersAttention Mechanisms

Differential Transformer V2: Faster Decoding and Improved Stability

Microsoft's Differential Transformer V2 achieves comparable decoding speeds to standard Transformers while reducing language modeling loss by 0.02-0.03 at 1T tokens.

Jan 20, 2026

AI NewsNLPTransformers

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Transformers v5 redesigns tokenization, separating tokenizer architecture from trained vocabulary for increased customization and a 20% reduction in code duplication across models.

Dec 1, 2025