Replies: 1 comment 2 replies
-
|
DeepDoc uses heuristics and spatial alignment to parse tables and flowcharts, but it doesn't explicitly model relationships between rows and columns or flowchart nodes and edges. This means complex tables and flowcharts often lose relational fidelity—flowcharts are treated as generic figures, and table relationships can be misaligned or incomplete, especially for advanced structures like spanning cells or hierarchical headers [source] [source]. Graph RAG in RAGFlow builds a knowledge graph from document chunks by extracting entities and relationships using LLM-based prompts. This graph-based approach is designed for multi-hop reasoning and complex relationship extraction, so it can improve retrieval accuracy for structured data like tables and flowcharts—especially when DeepDoc's heuristics fall short [source] [source]. For parser selection: DeepDoc is the default for formatted or image-based PDFs and is best for complex visual parsing, but can be slow and struggles with advanced table/flowchart extraction. MinerU and Docling are experimental alternatives—MinerU converts PDFs to machine-readable formats, and Docling is designed for generative AI document processing. Both are less tested for complex extraction tasks, so their reliability for tables and flowcharts isn't guaranteed [source]. There aren't any published benchmarks or direct performance comparisons between DeepDoc, MinerU, and Docling for table or flowchart parsing in the codebase. The guidance is descriptive, so if you need quantitative results, it's worth monitoring for future updates or community feedback. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Using Deepdoc to parse files seems to fail to parse some charts and flowcharts. Even if it does parse them, it cannot accurately extract the relationships between rows and columns in a table. I would like to know if it it cannot accurately extract the relationships, can using Graph RAG improve retrieval accuracy? Furthermore, would like to know the differences of performer between MinerU, docling, and Deepdoc, as well as their applicable scenarios.
Beta Was this translation helpful? Give feedback.
All reactions