Large Language Model

54 articles in this category (Page 1 of 3)

AI NewsAI InfrastructureLarge Language Model

Sakana AI and NVIDIA Introduce TwELL: 20.5% Faster LLM Inference via Unstructured Sparsity

Sakana AI and NVIDIA introduced TwELL and custom CUDA kernels, achieving 20.5% inference and 21.9% training speedups in LLMs by exploiting activation sparsity.

May 11, 2026

AI NewsLarge Language ModelSoftware Engineering

Mastering LLM Distillation: Soft-Label, Hard-Label, and Co-distillation Strategies

LLM distillation uses teacher-student models to transfer reasoning capabilities, reducing costs while maintaining performance through techniques like soft-label and co-distillation.

May 11, 2026

AI NewsExplainable AILarge Language Model

Anthropic Introduces Natural Language Autoencoders to Decode Claude's Internal Activations

Anthropic’s Natural Language Autoencoders (NLAs) convert model activations into readable text, detecting evaluation awareness in up to 26% of benchmark transcripts.

May 8, 2026

AI NewsAI InfrastructureLarge Language Model

Google AI Releases MTP Drafters for Gemma 4: Accelerating Inference by 3x

Google AI releases MTP drafters for Gemma 4, using speculative decoding to deliver up to 3x faster inference without quality loss.

May 6, 2026

AI NewsLarge Language ModelMachine Learning

TaskTrove: A Technical Workflow for Streaming Parsing and Verifier Detection

Efficiently stream and parse the multi-gigabyte TaskTrove dataset to detect RL-ready verifier signals using real-time binary decoding and automated visualization.

May 3, 2026

AI NewsAgentic AILarge Language Model

Mistral AI Unveils Mistral Medium 3.5 and Remote Agents for Vibe Coding Platform

Mistral AI launches Mistral Medium 3.5, a 128B model achieving a 77.6% SWE-Bench Verified score, alongside cloud-based remote coding agents.

May 2, 2026

AI NewsAI InfrastructureLarge Language Model

NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding

NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.

May 1, 2026

AI NewsAI InfrastructureLarge Language Model

Top 10 KV Cache Compression Techniques for LLM Inference

KV cache compression reduces memory overhead by up to 93.3%, enabling larger batch sizes and higher throughput for long-context LLM inference.

Apr 29, 2026

AI NewsLarge Language ModelArtificial Intelligence

How to Build Traceable and Evaluated LLM Workflows with Promptflow and Prompty

Build production-grade LLM pipelines using Promptflow and Prompty, featuring automated evaluation cycles and deterministic tool integration for full traceability.

Apr 28, 2026

AI NewsLarge Language ModelMachine Learning

Talkie-1930: A 13B Vintage LLM Trained Exclusively on Pre-1931 Data

Researchers released Talkie-1930, a 13B parameter open-weight LLM trained on 260 billion tokens of pre-1931 text to eliminate benchmark contamination and research historical reasoning.

Apr 27, 2026

AI NewsAgentic AILarge Language Model

Evaluating Agentic Reasoning: The 7 Benchmarks Defining Frontier LLM Performance

Frontier models now exceed 80% on SWE-bench Verified, yet reliability remains low with τ-bench pass^8 scores falling below 25% in retail domains.

Apr 26, 2026

AI NewsAgentic AILarge Language Model

How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama

Learn to build a local AI knowledge base using OpenKB and Llama 3.3, featuring automated wiki synthesis and programmatic graph analysis for structured information retrieval.

Apr 26, 2026

AI NewsLarge Language ModelAI Infrastructure

DeepSeek-V4: 1M-Token Contexts via Compressed Sparse Attention and Hybrid Architecture

DeepSeek-AI releases DeepSeek-V4, featuring hybrid CSA/HCA attention that reduces KV cache size to 10% of previous models while supporting one-million-token contexts.

Apr 24, 2026

AI NewsAgentic AILarge Language Model

OpenAI GPT-5.5: First Fully Retrained Agentic Model Hits 82.7% on Terminal-Bench

OpenAI releases GPT-5.5, a fully retrained agentic model scoring 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval for autonomous task execution.

Apr 23, 2026

AI NewsAgentic AILarge Language Model

Xiaomi MiMo-V2.5-Pro: Frontier Agentic AI at 60% Lower Token Cost

Xiaomi releases MiMo-V2.5-Pro, matching GPT-5.4 benchmarks while reducing token costs by 60% for long-horizon agentic tasks.

Apr 22, 2026

AI NewsAgentic AILarge Language Model

Qwen3.6-35B-A3B: Sparse MoE Vision-Language Model with 3B Active Parameters

Alibaba releases Qwen3.6-35B-A3B, a sparse MoE model with 3B active parameters that outperforms larger models on Terminal-Bench 2.0 and SWE-bench.

Apr 16, 2026

AI NewsLarge Language ModelArtificial Intelligence

A Technical Deep Dive into Modern LLM Training, Alignment, and Deployment Pipelines

Modern LLM training utilizes multi-stage pipelines from raw pretraining to 4-bit QLoRA fine-tuning and GRPO-based reasoning optimization for production.

Apr 15, 2026

AI NewsAgentic AILarge Language Model

MiniMax M2.7: Open-Source Self-Evolving Model Matches GPT-5.3-Codex on SWE-Pro

MiniMax open-sources M2.7, a self-evolving MoE model achieving 56.22% on SWE-Pro and 57.0% on Terminal Bench 2, matching GPT-5.3-Codex in production-level software engineering.

Apr 12, 2026

AI NewsAI InfrastructureLarge Language Model

TriAttention: MIT and NVIDIA's 10.7x KV Cache Compression for LLM Reasoning

TriAttention achieves 2.5x higher throughput and 10.7x KV memory reduction while matching full attention accuracy on the AIME25 benchmark.

Apr 11, 2026

AI NewsComputer VisionLarge Language Model

TII Releases Falcon Perception: A Unified 0.6B-Parameter Early-Fusion Transformer

TII’s Falcon Perception 0.6B model achieves a +21.9 point gain in spatial understanding over SAM 3 using a unified early-fusion architecture.

Apr 3, 2026

AI NewsAgentic AILarge Language Model

Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents

Arcee AI releases Trinity Large Thinking, a 400B sparse MoE reasoning model under Apache 2.0 with a 262,144-token context window.

Apr 2, 2026

AI NewsLarge Language ModelMachine Learning

Alibaba Releases Qwen3.5-Omni: A Native Multimodal Model for Real-Time Audio and Video Interaction

Alibaba Qwen Team unveils Qwen3.5-Omni, a native multimodal model achieving SOTA results on 215 subtasks while supporting 256k long-context audio-visual inputs.

Mar 30, 2026

AI NewsLarge Language ModelAgentic AI

Implementing Qwen3.5 Claude-Style Reasoning with GGUF and 4-Bit Quantization

Run a 27B Qwen3.5 distilled reasoning model on a single GPU using 4-bit quantization and GGUF for optimized inference under 16.5 GB of VRAM.

Mar 26, 2026

AI NewsAgentic AILarge Language Model

Google Releases Gemini 3.1 Flash Live: Real-Time Multimodal Voice for AI Agents

Google launches Gemini 3.1 Flash Live, a low-latency multimodal model achieving 90.8% on ComplexFuncBench Audio for real-time voice-first AI agents.

Mar 26, 2026