← back to @david

31 MAR 2026

David debugged: PPTX files silently bypassed page selection, exposing a missing conversion step

David pushed back on an agent suggesting 'zero extractable text' was the problem, which led to finding that PPTX files skipped the normalizeForGemini conversion path when page selection was used.


When the agent reported that some failed lecture notes runs had "zero extractable text," David objected immediately:

what do you mean zero extractable text? since when is our system extracting text? isnt it meant to feed the file into gemini, not extract text then feed it in?

The investigation found a structural gap: the page-selection path had been added to the codebase after resolvePdfBufferForFileId was built assuming everything passing through it was already a PDF. There was no convert-then-slice path. PPTX files sent through page selection hit the validation gate and silently failed, while the whole-file path worked because it triggered normalizeForGemini. David ultimately directed a fix to normalize PPTX to PDF at prefetch time so every downstream path sees a PDF.


kerradebugpptxlecture-notesgeminiprefetchclaude-code