Skip to main content

On This Page

Building Privacy-First AI Agents with Gemma 4 and Ollama

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to Implement Tool Calling with Gemma 4 and Python - MachineLearningMastery.com

Google recently released the Gemma 4 model family under an Apache 2.0 license to provide frontier-level capabilities for local infrastructure. The Gemma 4:e2b variant features native support for agentic workflows, enabling it to invoke functions through structured JSON outputs.

Why This Matters

Traditional language models are closed-loop systems that often hallucinate when asked for real-time data or external computations. Tool calling bridges this gap by allowing a 2-billion parameter model like Gemma 4:e2b to pause inference, request structured data from external APIs, and synthesize live context, effectively bypassing the limitations of static weights without the costs or privacy risks of cloud-based APIs.

Key Insights

  • The Gemma 4:e2b model (Google, 2026) activates an effective 2-billion parameter footprint during inference to achieve near-zero latency on consumer hardware.
  • Tool calling architecture serves as a bridge between static weights and dynamic autonomous agents by evaluating user prompts against a provided registry of programmatic tools.
  • Ollama serves as a local inference runner, allowing developers to maintain strict data privacy by executing tool-calling workflows entirely offline.
  • The gemma4:e2b model inherits multimodal properties and native function-calling capabilities from larger 31B models, despite its significantly smaller footprint.
  • A zero-dependency implementation using Python’s urllib and json libraries ensures maximum portability and transparency for local agent orchestration.

Working Examples

Python function implementing a two-stage API resolution pattern for real-time weather data.

def get_current_weather(city: str, unit: str = "celsius") -> str:
    try:
        geo_url = f"https://geocoding-api.open-meteo.com/v1/search?name={urllib.parse.quote(city)}&count=1"
        geo_req = urllib.request.Request(geo_url, headers={'User-Agent': 'Gemma4ToolCalling/1.0'})
        with urllib.request.urlopen(geo_req) as response:
            geo_data = json.loads(response.read().decode('utf-8'))
            if "results" not in geo_data or not geo_data["results"]:
                return f"Could not find coordinates for city: {city}."
            location = geo_data["results"][0]
            lat, lon = location["latitude"], location["longitude"]
        temp_unit = "fahrenheit" if unit.lower() == "fahrenheit" else "celsius"
        weather_url = f"https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}&current=temperature_2m,wind_speed_10m&temperature_unit={temp_unit}"
        weather_req = urllib.request.Request(weather_url, headers={'User-Agent': 'Gemma4ToolCalling/1.0'})
        with urllib.request.urlopen(weather_req) as response:
            weather_data = json.loads(response.read().decode('utf-8'))
            current = weather_data.get("current", {})
            return f"The current weather in {city.title()} is {current.get('temperature_2m')}{weather_data['current_units']['temperature_2m']}."
    except Exception as e:
        return f"Error: {e}"

The JSON schema registry used to inform the model about available programmatic tools.

{
  "type": "function",
  "function": {
    "name": "get_current_weather",
    "description": "Gets the current temperature for a given city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "The city name, e.g. Tokyo"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"]
        }
      },
      "required": ["city"]
    }
  }
}

Practical Applications

  • Local Desktop Agents: Using Ollama and Gemma 4:e2b to handle real-time weather, news, and currency conversion without external cloud orchestration. Pitfall: Vague JSON schema descriptions can lead to the model generating incorrect function arguments or failing to trigger the tool.
  • IoT Edge Computing: Deploying gemma4:e2b on mobile or IoT devices to process sensor data locally via function calling. Pitfall: Failing to inject the tool result back into the chat history results in a ‘hallucinated’ response rather than one grounded in real-time data.

References:

Continue reading

Next article

How WebAssembly Enables Privacy-First Browser Tools Without Server-Side Accounts

Related Content