Skip to main content

On This Page

Gemma 4 E2B Exhibits Configuration-Deterministic Hallucinations at Low Context

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

What num_ctx=2048 actually produces

Engineer Thehwang conducted a 15-run ablation study on the Gemma 4 E2B model. The tests revealed that at a context window of 2048, the model consistently generates three sequential outputs: a hallucinated summary, a self-disclaimer, and a cautious retry.

Why This Matters

This behavior highlights the gap between ‘trained calibration’ and ‘configuration-deterministic’ artifacts. While the model appears to detect truncated input, the effect only triggers under specific memory constraints (num_ctx=2048) and temperature settings (0.0), rather than being a general semantic capability for detecting damaged data across all configurations.

Key Insights

  • Configuration over Input: The multi-pass hedge fires specifically at num_ctx=2048 and temperature=0.0, regardless of whether the input is syntactically broken or semantically mid-stream (Thehwang, 2026).
  • Multi-Pass Response Pattern: The model performs real-time peer review by generating a templated hallucination followed by a ‘Note:’ stating the information is not in the transcript (Example: Gemma 4 E2B via Ollama).
  • Null Result at High Context: At num_ctx=32768, the model does not hedge on any input shape, including tail-of-document signals or mid-word cuts (Ablation Rows 2, 3, 4, 6).

Working Examples

Harness for replicating the calibration ablation study.

git clone https://github.com/thehwang/Scripta && cd Scripta/benchmarks/calibration-ablation
bash run.sh # rows 2, 3, 4, 6 at num_ctx=32768
NUM_CTX=2048 bash run.sh --rows row1 # the configuration-deterministic case
python3 classify.py > classification-report.md

Practical Applications

  • )Use case: Gemma 4 E2B via Ollama producing structured meeting summaries under strict context limits.
  • )Pitfall: Misinterpreting configuration artifacts as general model calibration; leads to overconfident claims about model reliability.

References:

Continue reading

Next article

Hardware End-of-Support-Life (EOSL): The Invisible Security Blind Spot

Related Content