Skip to main content
← All Tags

LLMs

17 articles in this category

AI NewsReinforcement LearningLLMs

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

LinkedIn successfully enabled agentic reinforcement learning training for the GPT-OSS-20B model, achieving comparable performance to OpenAI’s o3-mini and o4-mini.

Read more
AI NewsLLMsSoftware Agents

Unrolling the Codex agent loop

A technical deep dive into the Codex agent loop, explaining how Codex CLI orchestrates models, tools, prompts, and performance, achieving efficient agent behavior.

Read more
AI NewsGame DevelopmentLLMs

PiGym – LLM-Generated Pi Digit Memorization Game

PiGym demonstrates the capability of Claude Opus 4.5 to independently develop a functional game from natural language descriptions.

Read more
AI NewsLLMsCost Optimization

The $10K/Month Mistake: Stop Bleeding Money on Your AI Agents

AI agents built with Claude can quickly become expensive; optimizing system prompts and utilizing Skills can reduce costs by over 60%.

Read more
AI NewsLLMsAdoption

OpenAI Surpasses One Million Customers, Enabling Novel Task Completion

OpenAI has reached over one million customers globally, with 75% reporting the ability to complete tasks previously impossible.

Read more
AI NewsTransformer ModelsLLMs

Adapting Rotary Position Embeddings (RoPE) for Long Context Lengths

Llama 3 achieves 131K token context length by scaling RoPE frequencies, improving long-range stability without sacrificing local positional information.

Read more
AI NewsLLMsAgentic AI

Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic Models

NVIDIA’s Nemotron 3 Nano 30B A3B model achieves up to 3.3x higher throughput than leading models while maintaining best-in-class reasoning accuracy.

Read more
AI Newsllama.cppLLMs

New llama.cpp Server Feature: Dynamic Model Management

llama.cpp server introduces router mode, enabling dynamic loading and switching between multiple models without restarts.

Read more
AI NewsLLMsCustomer Service

Salesforce's eVerse Simulates Realistic Customer Service Interactions

Salesforce’s eVerse simulation tool aims to improve AI agent performance in noisy, unpredictable call centers, achieving 84-88% coverage of routine inquiries.

Read more
AI NewsLLMsEvaluation

FACTS Benchmark Suite: A New Evaluation for LLM Factuality

The FACTS Benchmark Suite provides a systematic evaluation of LLM factuality across reasoning types, revealing all evaluated models achieved under 70% accuracy.

Read more
AI NewsPrivacyLLMs

Privacy in Action: Realistic mitigation and evaluation for agentic LLMs

New research from Microsoft demonstrates two approaches to reducing privacy leaks in AI agents, achieving up to a 25% reduction in information leakage while preserving task completion.

Read more
AI NewsMachine LearningLLMs

Salesforce AI Research Introduces xRouter: A Reinforcement Learning Router for Cost Aware LLM Orchestration

Salesforce’s xRouter achieves near GPT-5 accuracy on Olympiad Bench while reducing GPT-5 evaluation cost by 87.5%.

Read more
AI NewsApple DevelopmentLLMs

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms

AnyLanguageModel simplifies LLM integration for Apple developers, offering a single API to seamlessly switch between local and remote models.

Read more
AI NewsLLMsInference

Continuous batching from first principles

Continuous batching maximizes LLM throughput by intelligently combining prefill and decode phases, achieving up to a 2x speedup in token generation.

Read more
AI NewsLLMsAI Architecture

Teaching LLMs to Count: IBM's PD-SSM Breakthrough

IBM's PD-SSM model achieves 98.5% accuracy on state tracking tasks, addressing LLM limitations in sequential reasoning.

Read more
AI NewsTransparencyLLMs

IBM Granite is Ranked World’s Most Transparent Model

IBM Granite achieved a 95% score on the Stanford Foundation Model Transparency Index, surpassing all other models by 23 percentage points.

Read more
AI NewsLLMsAI Evaluation

IBM and Notre Dame Open-Source Benchmark Cards for LLMs

IBM and University of Notre Dame released 105 validated benchmark cards and a dataset of 4,000 cards to improve LLM evaluation transparency.

Read more