markdown_chunker
Deprecated: MarkdownChunker stub for vLLM fallback path.
This module is deprecated and will be removed when vLLM fallback is fully deprecated. The docling-extractor service now exclusively uses HybridChunker with official Docling.
TODO: Remove this file when vLLM fallback is fully removed from docling_handler.py
Classes
HeaderContext
Context for tracking headers across chunks (deprecated).
MarkdownChunk
A chunk of markdown content (deprecated).
MarkdownChunker
Deprecated: Markdown chunker for vLLM fallback path.
This class is deprecated. Use HybridChunker with DoclingDocument instead.
Constructor:
def __init__(self, chunk_size: int = 400, chunk_overlap: int = 50) -> None
Methods
chunk_markdown
def chunk_markdown(self, content: str, page_numbers: list[int] | None = None, header_context: HeaderContext | None = None) -> tuple[list[MarkdownChunk], HeaderContext]
Chunk markdown content (deprecated stub).
This is a minimal stub that returns empty results. The actual chunking should be done via HybridChunker.