2 articles in this category
DFlash achieves 6x lossless acceleration by replacing sequential drafting with parallel block diffusion, as reported by Z Lab in 2026.
IBM Storage Scale reduces time-to-first-token (TTFT) by 8-12x for LLM inference by providing a high-performance KV cache tier.