-
|
Hi everyone, I'm an BE trying to build RAGFlow for my company. I am deep diving into the code and see that there is a hard-code in Hybrid Search that combines:
Could anyone explain the reason why author hard coded like this? (follow any paper or any sources ?) I mean why the weight of Text Search is far lower than that of Vector Search? If I change it, does it affect to the Chatbot response a lot ? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Hi @VanBap! I'm Dosu and I’m helping the ragflow team. The 0.05 (text/BM25) and 0.95 (vector/embedding) weights in RAGFlow's hybrid search are not based on any published paper or standard—they're an empirical/default choice by the authors, likely to prioritize semantic retrieval via embeddings, which often works better for open-ended or paraphrased queries in modern RAG setups. There's no comment or documentation in the code explaining this specific split, and no evidence it follows a particular source or benchmark [source]. Changing these weights will directly affect which documents are retrieved and thus what context the chatbot sees. Increasing the text search weight will make keyword matches more influential, while increasing the vector search weight will make semantic matches more dominant. The best ratio depends on your data and use case—it's common to tune these weights empirically for optimal performance [source]. If you notice retrieval quality issues, consider normalizing scores or adjusting the weights to balance between keyword and semantic relevance [source]. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
This weight is adjusted through UI instead of hard coding. |
Beta Was this translation helpful? Give feedback.

Hi @VanBap! I'm Dosu and I’m helping the ragflow team.
The 0.05 (text/BM25) and 0.95 (vector/embedding) weights in RAGFlow's hybrid search are not based on any published paper or standard—they're an empirical/default choice by the authors, likely to prioritize semantic retrieval via embeddings, which often works better for open-ended or paraphrased queries in modern RAG setups. There's no comment or documentation in the code explaining this specific split, and no evidence it follows a particular source or benchmark [source].
Changing these weights will directly affect which documents are retrieved and thus what context the chatbot sees. Increasing the text search weight will make keyword…