02 APR 2026

David debugged: Monitor TLR model switch to Gemini 2.5 Flash — semaphore investigation

David investigated whether the Monitor TLR was running the intended Gemini model and pushed back on a proposed rate-limiting semaphore that would have bottlenecked production.

David suspected the production Monitor TLR was still running on the old model rather than Gemini 2.5 Flash Lite, and asked for a production DB and codebase check:

wasnt it changed to gemini 2.5 flash lite? im pretty sure it was. go double check, and check to see if theres local changes that did that that havent been deployed

Before executing the change, he sent three Opus 4.7 subagents to investigate complexities specific to Gemini vs OpenAI integration differences. When an agent flagged that switching Monitor into the shared Gemini semaphore (GEMINI_RATE_LIMIT_MAX_IN_FLIGHT=25) could starve monitor calls during lecture notes spikes, David pushed back hard:

no i dont want a max in flight! and i dont want a semaphore that bottlenecks or limits anything. is there currently one thats actively used? go investigate if theres an actively used one for anything

He also identified a scoping issue: the 19.5M TPM Gemini limit was being applied too broadly, when it should only apply to gemini-flash-3 (the lecture notes model), not to Monitor or other agents.

kerra monitor-tlr gemini decision production claude-code