-
|
For the chunker selection of ingestion pipeline , if I use the title chunker first and then the token chunker, i.e., connect two different chunkers, will the token chunker only take effect if the size of a certain chunk exceeds the set chunk size, based on the hierarchy-based chunking? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 12 replies
-
|
Yes, when you use the Title chunker first and then the Token chunker in the ingestion pipeline, the Title chunker splits the document into chunks based on heading levels (like H1, H2, H3, H4). The Token chunker then only further splits those chunks that exceed the configured token size limit—so it acts as a fallback to ensure no chunk is too large after the initial hierarchical split. This approach preserves document structure as much as possible, only breaking up oversized chunks as needed. Placing the Title chunker after the Token chunker is not supported and may cause errors, so always keep the Title chunker before the Token chunker in your pipeline (Title chunker docs, Token chunker docs). To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Yes, when you use the Title chunker first and then the Token chunker in the ingestion pipeline, the Title chunker splits the document into chunks based on heading levels (like H1, H2, H3, H4). The Token chunker then only further splits those chunks that exceed the configured token size limit—so it acts as a fallback to ensure no chunk is too large after the initial hierarchical split. This approach preserves document structure as much as possible, only breaking up oversized chunks as needed. Placing the Title chunker after the Token chunker is not supported and may cause errors, so always keep the Title chunker before the Token chunker in your pipeline (Title chunker docs, Token chunker …