official_docling_client
Official Docling Client using IBM's DocumentConverter.
This is the primary document extraction method, using the official Docling library for end-to-end PDF processing.
Based on IBM Granite Docling official documentation: https://www.ibm.com/granite/docs/models/docling https://huggingface.co/ibm-granite/granite-docling-258M
Classes
OfficialDoclingResult
Result from official Docling conversion.
OfficialDoclingClient
Client for official IBM Docling DocumentConverter.
This client uses the official Docling Python library for document extraction, providing a more reliable pipeline than the vLLM endpoint.
Constructor:
def __init__(self) -> None
Methods
convert_document
def convert_document(self, pdf_bytes: bytes, filename: str = 'document.pdf', estimated_page_count: int | None = None, progress_callback: Any | None = None, enable_table_detection: bool | None = None) -> OfficialDoclingResult
Convert PDF using official Docling library.
Uses the standard pipeline (proven reliable, ~2.3s/page).
Args: pdf_bytes: PDF file content filename: Original filename for logging estimated_page_count: Optional page count for dynamic timeout calculation progress_callback: Optional callback function called every 30 seconds during conversion. Callback signature: (progress_percent: int, elapsed_seconds: int, stage: str) -> None
Returns: OfficialDoclingResult with document, markdown, html, and metadata
Raises: ValueError: If file size exceeds limit, page count exceeds limit, or content quality is insufficient TimeoutError: If conversion exceeds timeout
convert_document_vlm
def convert_document_vlm(self, pdf_bytes: bytes, filename: str = 'document.pdf', estimated_page_count: int | None = None) -> OfficialDoclingResult | None
Convert PDF using VLM pipeline (time-boxed, single attempt).
This is a tool method - decision to use it is made by the handler. Conversion logic is identical to convert_document(), but uses VLM converter.
Args: pdf_bytes: PDF file content filename: Original filename estimated_page_count: Optional page count for timeout calculation
Returns: OfficialDoclingResult if successful, None if failed/timed out
health_check
def health_check(self) -> bool
Check if the official Docling client is healthy.
Returns: True if client is ready, False otherwise
Functions
get_official_docling_client
def get_official_docling_client() -> OfficialDoclingClient
Get or create singleton official Docling client.
close_official_docling_client
def close_official_docling_client() -> None
Close and release the singleton official Docling client.
Call this during shutdown to properly release resources.