Magentic Marketplace: Open-source platform to study agentic markets

Magentic Marketplace: A Simulation Environment for Agentic Market Research

Microsoft Research has developed Magentic Marketplace, an open-source platform designed to simulate and study the behavior of autonomous AI agents in digital marketplaces. This tool enables researchers to explore complex interactions between agents, assess market efficiency, and identify challenges like manipulation vulnerabilities and systemic biases. The platform is modular, extensible, and focused on two-sided markets, where customer and business agents transact dynamically.

Key Features and Architecture

Magentic Marketplace is built on three core architectural principles:

HTTP/REST client-server architecture:
- Agents operate as independent clients, while the Marketplace Environment acts as a centralized server.
- This mirrors real-world platforms and separates customer and business agent roles for clarity and scalability.
Minimal three-endpoint market protocol:
- Agents use three endpoints: register (to join the marketplace), protocol discovery (to identify available actions), and action execution (to perform transactions).
- This design allows seamless addition of new capabilities without disrupting existing experiments.
Rich action protocol:
- Supports message types for the full transaction lifecycle: search, negotiation, proposals, and payments.
- Extensible to include features like refunds, reviews, or ratings, enabling evolving marketplace scenarios.

A visualization module allows researchers to observe agent interactions and review transaction threads, enhancing transparency and analysis.

Experimental Setup and Methodology

To ensure reproducibility, the platform uses synthetic data from Microsoft’s open-source repository. Experiments simulate transactions like food ordering and home improvement services, with 100 customer agents and 300 business agents. Both proprietary (e.g., GPT-4o, GPT-5, Gemini-2.5-Flash) and open-source models (e.g., OSS-20b, Qwen3-14b) are tested.

Key metrics include:

Consumer welfare: Calculated as the sum of utilities (customer valuations minus prices paid) across all transactions.
Performance benchmarks: Models are compared against baselines like random selection or choosing the cheapest option.

Key Findings from Experiments

1. Agents Improve Consumer Welfare with Effective Discovery

Advanced models like GPT-5 achieved near-optimal performance under Agentic: Perfect search conditions, where agents received top matches without needing to navigate paginated lists.
GPTOSS-20b outperformed GPT-4o in some scenarios, suggesting compact open-source models can rival proprietary ones in complex environments.
Qwen3-4b-2507 struggled with Lexical search (realistic discovery tasks), while Qwen3-14b showed fundamental reasoning limitations.

2. The Paradox of Choice: More Options ≠ Better Outcomes

Most models contacted only a small fraction of available businesses, even when search results were expanded to 100 options.
Consumer welfare declined with more options:
- GPT-5 dropped from 2,000 (optimal) to 1,400.
- Gemini-2.5-Flash declined from 1,700 to 1,350.
- Claude Sonnet 4 showed the steepest drop (1,800 → 600), struggling with large option sets and selecting irrelevant businesses.

3. Vulnerability to Manipulation

Six manipulation strategies were tested, including prompt injection and social proof tactics.
Sonnet-4 and Gemini-2.5-Flash showed strong resistance, but GPT-4o, GPTOSS-20b, and Qwen3-4b were highly vulnerable:
- Payments were redirected to manipulative agents under prompt injection attacks.
- Qwen3-4b and GPTOSS-20b even fell for basic psychological tactics like fake certifications or social proof.

4. Systemic Biases in Agent Decision-Making

Positional bias: Open-source models like Qwen2.5-14b-2507 showed a preference for the last business in search results, regardless of quality.
Proposal bias: All models prioritized accepting the first offer, missing better alternatives.
These biases create unfair advantages, incentivizing businesses to prioritize response speed over quality.

Implications and Future Work

The study highlights critical challenges for agentic markets:

Security risks: Agents are susceptible to manipulation, requiring robust safeguards.
Design trade-offs: Market structures (e.g., open vs. closed platforms) must balance openness, fairness, and efficiency.
Human oversight: Agents should augment—not replace—human decision-making, especially in high-stakes transactions.

Microsoft plans to expand the platform to study dynamic markets and human-in-the-loop systems, improving trust and scalability. Researchers are encouraged to adapt the framework for nuanced metrics like fairness and societal impact.

Access and Resources

Magentic Marketplace is available on GitHub and Azure AI Foundry Labs. Documentation and experiment templates are provided to facilitate replication and extension of the research.

For more details, refer to the full paper: The Agentic Economy.

https://www.microsoft.com/en-us/research/blog/magentic-marketplace-an-open-source-simulation-environment-for-studying-agentic-markets/

On This Page

Magentic Marketplace: A Simulation Environment for Agentic Market Research

Key Features and Architecture

Experimental Setup and Methodology

Key Findings from Experiments

1. Agents Improve Consumer Welfare with Effective Discovery

2. The Paradox of Choice: More Options ≠ Better Outcomes

3. Vulnerability to Manipulation

4. Systemic Biases in Agent Decision-Making

Implications and Future Work

Access and Resources

Continue reading

Related Content

CopilotKit v1.50 Leverages AG-UI for Streamlined In-App Agent Integration

Google Introduces A2UI (Agent-to-User Interface): An Open Source Protocol for Agent Driven Interfaces

SETA: Open Source Reinforcement Learning Environments for Terminal Agents