pdf_handler
PDF text extraction handler supporting both native and scanned PDFs.
Handles three scenarios:
- Scanned PDFs (image-based) -> OCR extraction
- Native PDFs with valid text -> Native extraction (PyPDF)
- Native PDFs with garbled text (font encoding issues) -> OCR fallback per-page