chunker
Text chunking with token counting and overlap.
Classes
TextChunker
Chunks text intelligently with overlap and token limits.
Constructor:
def __init__(self, chunk_size: int | None = None, chunk_overlap: int | None = None) -> None
Methods
chunk_text
def chunk_text(self, text: str, page_numbers: list[int] | None = None, section_name: str | None = None) -> list[dict[str, Any]]
Chunk text into smaller pieces with overlap.
Args: text: Full text to chunk page_numbers: List of page numbers this text spans (optional) section_name: Name of section (optional)
Returns:
List of chunk dictionaries with keys: - chunk_index: int (0-indexed) - text: str - token_count: int - page_numbers: list[int] (optional) - section_reference: dict (optional) - \{name: str, number: int\} - included if section_name provided