RAGflow hybrid search hard-code weights #11096

VanBap · 2025-11-07T03:41:28Z

VanBap
Nov 7, 2025

Hi everyone, I'm an BE trying to build RAGFlow for my company. I am deep diving into the code and see that there is a hard-code in Hybrid Search that combines:

Text Search (BM25/Full-text search) - weight 0.05 (5%)
Vector Search (Dense embedding search) - weight 0.95 (95%)

Could anyone explain the reason why author hard coded like this? (follow any paper or any sources ?) I mean why the weight of Text Search is far lower than that of Vector Search? If I change it, does it affect to the Chatbot response a lot ?
Thank you very much

code path: ragflow/rag/nlp/search -> line 138

Answered by dosubot[bot]

Nov 7, 2025

Hi @VanBap! I'm Dosu and I’m helping the ragflow team.

The 0.05 (text/BM25) and 0.95 (vector/embedding) weights in RAGFlow's hybrid search are not based on any published paper or standard—they're an empirical/default choice by the authors, likely to prioritize semantic retrieval via embeddings, which often works better for open-ended or paraphrased queries in modern RAG setups. There's no comment or documentation in the code explaining this specific split, and no evidence it follows a particular source or benchmark [source].

Changing these weights will directly affect which documents are retrieved and thus what context the chatbot sees. Increasing the text search weight will make keyword…

View full answer

@VanBap · 2025-11-07T03:47:04Z

dosubot[bot]
bot Nov 7, 2025

Hi @VanBap! I'm Dosu and I’m helping the ragflow team.

The 0.05 (text/BM25) and 0.95 (vector/embedding) weights in RAGFlow's hybrid search are not based on any published paper or standard—they're an empirical/default choice by the authors, likely to prioritize semantic retrieval via embeddings, which often works better for open-ended or paraphrased queries in modern RAG setups. There's no comment or documentation in the code explaining this specific split, and no evidence it follows a particular source or benchmark [source].

Changing these weights will directly affect which documents are retrieved and thus what context the chatbot sees. Increasing the text search weight will make keyword matches more influential, while increasing the vector search weight will make semantic matches more dominant. The best ratio depends on your data and use case—it's common to tune these weights empirically for optimal performance [source]. If you notice retrieval quality issues, consider normalizing scores or adjusting the weights to balance between keyword and semantic relevance [source].

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

yingfeng · 2025-11-07T06:43:49Z

yingfeng
Nov 7, 2025
Maintainer

This weight is adjusted through UI instead of hard coding.

1 reply

VanBap Nov 7, 2025
Author

Thank u so much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

RAGflow hybrid search hard-code weights #11096

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

RAGflow hybrid search hard-code weights #11096

Uh oh!

Uh oh!

VanBap Nov 7, 2025

Replies: 2 comments · 1 reply

Uh oh!

dosubot[bot] bot Nov 7, 2025

Uh oh!

yingfeng Nov 7, 2025 Maintainer

Uh oh!

VanBap Nov 7, 2025 Author

VanBap
Nov 7, 2025

Replies: 2 comments 1 reply

dosubot[bot]
bot Nov 7, 2025

yingfeng
Nov 7, 2025
Maintainer

VanBap Nov 7, 2025
Author