Skip to main content

official_docling_client

Official Docling Client using IBM's DocumentConverter.

This is the primary document extraction method, using the official Docling library for end-to-end PDF processing.

Based on IBM Granite Docling official documentation: https://www.ibm.com/granite/docs/models/docling https://huggingface.co/ibm-granite/granite-docling-258M

Classes

OfficialDoclingResult

Result from official Docling conversion.

OfficialDoclingClient

Client for official IBM Docling DocumentConverter.

This client uses the official Docling Python library for document extraction, providing a more reliable pipeline than the vLLM endpoint.

Constructor:

def __init__(self) -> None

Methods

convert_document

def convert_document(self, pdf_bytes: bytes, filename: str = 'document.pdf', estimated_page_count: int | None = None, progress_callback: Any | None = None, enable_table_detection: bool | None = None) -> OfficialDoclingResult

Convert PDF using official Docling library.

Uses the standard pipeline (proven reliable, ~2.3s/page).

Args: pdf_bytes: PDF file content filename: Original filename for logging estimated_page_count: Optional page count for dynamic timeout calculation progress_callback: Optional callback function called every 30 seconds during conversion. Callback signature: (progress_percent: int, elapsed_seconds: int, stage: str) -> None

Returns: OfficialDoclingResult with document, markdown, html, and metadata

Raises: ValueError: If file size exceeds limit, page count exceeds limit, or content quality is insufficient TimeoutError: If conversion exceeds timeout

convert_document_vlm

def convert_document_vlm(self, pdf_bytes: bytes, filename: str = 'document.pdf', estimated_page_count: int | None = None) -> OfficialDoclingResult | None

Convert PDF using VLM pipeline (time-boxed, single attempt).

This is a tool method - decision to use it is made by the handler. Conversion logic is identical to convert_document(), but uses VLM converter.

Args: pdf_bytes: PDF file content filename: Original filename estimated_page_count: Optional page count for timeout calculation

Returns: OfficialDoclingResult if successful, None if failed/timed out

health_check

def health_check(self) -> bool

Check if the official Docling client is healthy.

Returns: True if client is ready, False otherwise

Functions

get_official_docling_client

def get_official_docling_client() -> OfficialDoclingClient

Get or create singleton official Docling client.

close_official_docling_client

def close_official_docling_client() -> None

Close and release the singleton official Docling client.

Call this during shutdown to properly release resources.