Skip to main content

markdown_chunker

Deprecated: MarkdownChunker stub for vLLM fallback path.

This module is deprecated and will be removed when vLLM fallback is fully deprecated. The docling-extractor service now exclusively uses HybridChunker with official Docling.

TODO: Remove this file when vLLM fallback is fully removed from docling_handler.py

Classes

HeaderContext

Context for tracking headers across chunks (deprecated).

MarkdownChunk

A chunk of markdown content (deprecated).

MarkdownChunker

Deprecated: Markdown chunker for vLLM fallback path.

This class is deprecated. Use HybridChunker with DoclingDocument instead.

Constructor:

def __init__(self, chunk_size: int = 400, chunk_overlap: int = 50) -> None

Methods

chunk_markdown

def chunk_markdown(self, content: str, page_numbers: list[int] | None = None, header_context: HeaderContext | None = None) -> tuple[list[MarkdownChunk], HeaderContext]

Chunk markdown content (deprecated stub).

This is a minimal stub that returns empty results. The actual chunking should be done via HybridChunker.