Skip to main content

main

Main entry point for text extraction service.

Classes

DocumentProcessor

Processes documents from Kafka messages.

Constructor:

def __init__(self) -> None

Methods

process_document

def process_document(self, message_data: dict[str, Any]) -> None

Process a single document with idempotency checks.

Args: message_data: Kafka message data with keys:

  • document_id: str (required) - Document UUID
  • storage_path: str (required) - Relative path within bucket (e.g., "path/to/file.ext")
  • organization_id: str (required)
  • project_id: str (optional)
  • filename: str (optional)
  • trace_id: str (optional) - End-to-end trace ID from workspace-api
  • user_id: str (optional)
  • metadata: dict (optional) - JSON metadata to store in database
  • file_size_bytes: int (optional)
  • content_type: str (optional)

Functions

main

def main() -> None

Main entry point.