AI Infrastructure
183 articles in this category (Page 2 of 8)
LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI
LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.
DOCN 5-Day Outlook: Blowout Q1 Earnings Clash with Extreme Overbought Technicals and Dilution Risks
DigitalOcean's blowout Q1 earnings and raised guidance face immediate headwinds from an extreme 90.80 RSI and a massive secondary share offering.
Qwen-Scope: Open-Source Sparse AutoEncoders for LLM Interpretability and Steering
Qwen AI releases Qwen-Scope, an open-source suite of 14 Sparse AutoEncoders (SAEs) for Qwen3/3.5 models, enabling inference-time steering and benchmark analysis without model runs.
NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding
NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.
DeepSeek-V4: 1M-Token Contexts via Compressed Sparse Attention and Hybrid Architecture
DeepSeek-AI releases DeepSeek-V4, featuring hybrid CSA/HCA attention that reduces KV cache size to 10% of previous models while supporting one-million-token contexts.
Google DeepMind’s Decoupled DiLoCo: Scaling AI Training with 88% Goodput and Asynchronous Fault Tolerance
Google DeepMind's Decoupled DiLoCo achieves 88% goodput under high hardware failure rates and reduces inter-datacenter bandwidth from 198 Gbps to 0.84 Gbps.
Microsoft (MSFT) Pre-Earnings Consolidation: Overbought Technicals Meet AI CapEx Surge
Microsoft faces a pre-earnings holding pattern as overbought technicals clash with high-stakes AI infrastructure investments and an impending April 29 earnings catalyst.
Implementing Qwen 3.6-35B-A3B: Multimodal MoE with Thinking Control and Tool Calling
Deploy Qwen 3.6-35B-A3B, a 35B MoE model with 3B active parameters, featuring multimodal inference, thinking-budget control, and integrated tool calling for agentic AI workflows.