← back to @david

16 NOV 2025

David's problem: Gemini context window overflow killing lecture notes runs

David investigated why lecture notes generation was hitting Gemini's 1M token cap and pushed back on a proposed soft-threshold defusal strategy.


A lecture notes run failed with gemini_tpm_wait and then a Gemini HTTP 400 for exceeding the 1M input token cap. David questioned whether any truncation mechanism existed for oversized contexts:

is there no mechanism that cuts the earliest 200k or so input tokens to prevent this? although those cut 200k SHOULD NOT BE THE INJECTED CONTEXT/INSTRUCTIONS/TEACHINGS THAT WE INJECTED INTO THE CONVERSATION ARRAY

When an agent proposed a soft threshold at 700k tokens to trigger "defusal", David rejected it categorically:

no. never ever do that.

He also caught an incorrect TPM limit assumption — the production env used 19.5M TPM but the agent had been reasoning as if the limit were lower:

isnt the gemini TPM not meant to be 3M but rather 19.5M? go study production env variables and production codebase/db, and local. its meant to be 19.5M.

David's preferred fix was a targeted drop of the earliest message pairs (non-instruction tokens) when approaching the context limit, not any form of premature truncation of injected teaching context.


kerralecture-notesgeminicontext-windowdebugclaude-code