16 NOV 2025
David's problem: Gemini context window overflow killing lecture notes runs
David investigated why lecture notes generation was hitting Gemini's 1M token cap and pushed back on a proposed soft-threshold defusal strategy.
A lecture notes run failed with gemini_tpm_wait and then a Gemini HTTP 400 for exceeding the 1M input token cap. David questioned whether any truncation mechanism existed for oversized contexts:
is there no mechanism that cuts the earliest 200k or so input tokens to prevent this? although those cut 200k SHOULD NOT BE THE INJECTED CONTEXT/INSTRUCTIONS/TEACHINGS THAT WE INJECTED INTO THE CONVERSATION ARRAY
When an agent proposed a soft threshold at 700k tokens to trigger "defusal", David rejected it categorically:
no. never ever do that.
He also caught an incorrect TPM limit assumption — the production env used 19.5M TPM but the agent had been reasoning as if the limit were lower:
isnt the gemini TPM not meant to be 3M but rather 19.5M? go study production env variables and production codebase/db, and local. its meant to be 19.5M.
David's preferred fix was a targeted drop of the earliest message pairs (non-instruction tokens) when approaching the context limit, not any form of premature truncation of injected teaching context.