Skip to content
Discussion options

You must be logged in to vote

Yes, when you use the Title chunker first and then the Token chunker in the ingestion pipeline, the Title chunker splits the document into chunks based on heading levels (like H1, H2, H3, H4). The Token chunker then only further splits those chunks that exceed the configured token size limit—so it acts as a fallback to ensure no chunk is too large after the initial hierarchical split. This approach preserves document structure as much as possible, only breaking up oversized chunks as needed. Placing the Title chunker after the Token chunker is not supported and may cause errors, so always keep the Title chunker before the Token chunker in your pipeline (Title chunker docs, Token chunker …

Replies: 1 comment 12 replies

Comment options

You must be logged in to vote
12 replies
@dosubot
Comment options

@dosubot
Comment options

@Yeon1007
Comment options

@dosubot
Comment options

@Yeon1007
Comment options

Answer selected by Yeon1007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant