main

Main entry point for text extraction service.

Classes

Processes documents from Kafka messages.

Constructor:

def __init__(self) -> None

def process_document(self, message_data: dict[str, Any]) -> None

Process a single document with idempotency checks.

Args: message_data: Kafka message data with keys:

document_id: str (required) - Document UUID
storage_path: str (required) - Relative path within bucket (e.g., "path/to/file.ext")
organization_id: str (required)
project_id: str (optional)
filename: str (optional)
trace_id: str (optional) - End-to-end trace ID from workspace-api
user_id: str (optional)
metadata: dict (optional) - JSON metadata to store in database
file_size_bytes: int (optional)
content_type: str (optional)

def main() -> None

Main entry point.