Gemma 4 E2B Exhibits Configuration-Deterministic Hallucinations at Low Context
These articles are AI-generated summaries. Please check the original sources for full details.
What num_ctx=2048 actually produces
Engineer Thehwang conducted a 15-run ablation study on the Gemma 4 E2B model. The tests revealed that at a context window of 2048, the model consistently generates three sequential outputs: a hallucinated summary, a self-disclaimer, and a cautious retry.
Why This Matters
This behavior highlights the gap between ‘trained calibration’ and ‘configuration-deterministic’ artifacts. While the model appears to detect truncated input, the effect only triggers under specific memory constraints (num_ctx=2048) and temperature settings (0.0), rather than being a general semantic capability for detecting damaged data across all configurations.
Key Insights
- Configuration over Input: The multi-pass hedge fires specifically at num_ctx=2048 and temperature=0.0, regardless of whether the input is syntactically broken or semantically mid-stream (Thehwang, 2026).
- Multi-Pass Response Pattern: The model performs real-time peer review by generating a templated hallucination followed by a ‘Note:’ stating the information is not in the transcript (Example: Gemma 4 E2B via Ollama).
- Null Result at High Context: At num_ctx=32768, the model does not hedge on any input shape, including tail-of-document signals or mid-word cuts (Ablation Rows 2, 3, 4, 6).
Working Examples
Harness for replicating the calibration ablation study.
git clone https://github.com/thehwang/Scripta && cd Scripta/benchmarks/calibration-ablation
bash run.sh # rows 2, 3, 4, 6 at num_ctx=32768
NUM_CTX=2048 bash run.sh --rows row1 # the configuration-deterministic case
python3 classify.py > classification-report.md
Practical Applications
- )Use case: Gemma 4 E2B via Ollama producing structured meeting summaries under strict context limits.
- )Pitfall: Misinterpreting configuration artifacts as general model calibration; leads to overconfident claims about model reliability.
References:
Continue reading
Next article
Hardware End-of-Support-Life (EOSL): The Invisible Security Blind Spot
Related Content
MCP vs. CLI: Measuring Token Overhead in Agent Search
A comparison of SerpApi MCP and a custom CLI reveals that MCP can use 17x more tokens per call for stateless search tasks.
Custom Evals: A Unified Evaluation Framework for 17+ LLM Agent Frameworks
Custom Evals provides a lightweight, backend-free evaluation layer supporting 17+ agent frameworks with a four-layer metric system.
Synthadoc v0.6.0: Solving Knowledge Staleness with Lifecycle State Machines
Synthadoc v0.6.0 introduces a five-state page lifecycle and four export formats to detect content staleness without additional LLM calls.