Skip to main content

chunker

Text chunking with token counting and overlap.

Classes

TextChunker

Chunks text intelligently with overlap and token limits.

Constructor:

def __init__(self, chunk_size: int | None = None, chunk_overlap: int | None = None) -> None

Methods

chunk_text

def chunk_text(self, text: str, page_numbers: list[int] | None = None, section_name: str | None = None) -> list[dict[str, Any]]

Chunk text into smaller pieces with overlap.

Args: text: Full text to chunk page_numbers: List of page numbers this text spans (optional) section_name: Name of section (optional)

Returns: List of chunk dictionaries with keys: - chunk_index: int (0-indexed) - text: str - token_count: int - page_numbers: list[int] (optional) - section_reference: dict (optional) - \{name: str, number: int\} - included if section_name provided