Building Privacy-First AI Agents with Gemma 4 and Ollama

How to Implement Tool Calling with Gemma 4 and Python - MachineLearningMastery.com

Google recently released the Gemma 4 model family under an Apache 2.0 license to provide frontier-level capabilities for local infrastructure. The Gemma 4:e2b variant features native support for agentic workflows, enabling it to invoke functions through structured JSON outputs.

Why This Matters

Traditional language models are closed-loop systems that often hallucinate when asked for real-time data or external computations. Tool calling bridges this gap by allowing a 2-billion parameter model like Gemma 4:e2b to pause inference, request structured data from external APIs, and synthesize live context, effectively bypassing the limitations of static weights without the costs or privacy risks of cloud-based APIs.

Key Insights

The Gemma 4:e2b model (Google, 2026) activates an effective 2-billion parameter footprint during inference to achieve near-zero latency on consumer hardware.
Tool calling architecture serves as a bridge between static weights and dynamic autonomous agents by evaluating user prompts against a provided registry of programmatic tools.
Ollama serves as a local inference runner, allowing developers to maintain strict data privacy by executing tool-calling workflows entirely offline.
The gemma4:e2b model inherits multimodal properties and native function-calling capabilities from larger 31B models, despite its significantly smaller footprint.
A zero-dependency implementation using Python’s urllib and json libraries ensures maximum portability and transparency for local agent orchestration.

Working Examples

Python function implementing a two-stage API resolution pattern for real-time weather data.

def get_current_weather(city: str, unit: str = "celsius") -> str:
    try:
        geo_url = f"https://geocoding-api.open-meteo.com/v1/search?name={urllib.parse.quote(city)}&count=1"
        geo_req = urllib.request.Request(geo_url, headers={'User-Agent': 'Gemma4ToolCalling/1.0'})
        with urllib.request.urlopen(geo_req) as response:
            geo_data = json.loads(response.read().decode('utf-8'))
            if "results" not in geo_data or not geo_data["results"]:
                return f"Could not find coordinates for city: {city}."
            location = geo_data["results"][0]
            lat, lon = location["latitude"], location["longitude"]
        temp_unit = "fahrenheit" if unit.lower() == "fahrenheit" else "celsius"
        weather_url = f"https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}&current=temperature_2m,wind_speed_10m&temperature_unit={temp_unit}"
        weather_req = urllib.request.Request(weather_url, headers={'User-Agent': 'Gemma4ToolCalling/1.0'})
        with urllib.request.urlopen(weather_req) as response:
            weather_data = json.loads(response.read().decode('utf-8'))
            current = weather_data.get("current", {})
            return f"The current weather in {city.title()} is {current.get('temperature_2m')}{weather_data['current_units']['temperature_2m']}."
    except Exception as e:
        return f"Error: {e}"

The JSON schema registry used to inform the model about available programmatic tools.

{
  "type": "function",
  "function": {
    "name": "get_current_weather",
    "description": "Gets the current temperature for a given city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "The city name, e.g. Tokyo"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"]
        }
      },
      "required": ["city"]
    }
  }
}

Practical Applications

Local Desktop Agents: Using Ollama and Gemma 4:e2b to handle real-time weather, news, and currency conversion without external cloud orchestration. Pitfall: Vague JSON schema descriptions can lead to the model generating incorrect function arguments or failing to trigger the tool.
IoT Edge Computing: Deploying gemma4:e2b on mobile or IoT devices to process sensor data locally via function calling. Pitfall: Failing to inject the tool result back into the chat history results in a ‘hallucinated’ response rather than one grounded in real-time data.

References:

https://machinelearningmastery.com/how-to-implement-tool-calling-with-gemma-4-and-python/

On This Page

How to Implement Tool Calling with Gemma 4 and Python - MachineLearningMastery.com

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers

OpenAI’s Agent RFT: Reinforcement Fine-Tuning for Tool-Using Agents

Building Scalable ML Pipelines on Millions of Rows with Vaex