DeepDoc Can't Parse the Table and FlowChart #11473

Yeon1007 · 2025-11-24T05:36:10Z

Yeon1007
Nov 24, 2025

Using Deepdoc to parse files seems to fail to parse some charts and flowcharts. Even if it does parse them, it cannot accurately extract the relationships between rows and columns in a table. I would like to know if it it cannot accurately extract the relationships, can using Graph RAG improve retrieval accuracy? Furthermore, would like to know the differences of performer between MinerU, docling, and Deepdoc, as well as their applicable scenarios.

Yeon1007 · 2025-11-24T05:41:14Z

dosubot[bot]
bot Nov 24, 2025

DeepDoc uses heuristics and spatial alignment to parse tables and flowcharts, but it doesn't explicitly model relationships between rows and columns or flowchart nodes and edges. This means complex tables and flowcharts often lose relational fidelity—flowcharts are treated as generic figures, and table relationships can be misaligned or incomplete, especially for advanced structures like spanning cells or hierarchical headers [source] [source].

Graph RAG in RAGFlow builds a knowledge graph from document chunks by extracting entities and relationships using LLM-based prompts. This graph-based approach is designed for multi-hop reasoning and complex relationship extraction, so it can improve retrieval accuracy for structured data like tables and flowcharts—especially when DeepDoc's heuristics fall short [source] [source].

For parser selection: DeepDoc is the default for formatted or image-based PDFs and is best for complex visual parsing, but can be slow and struggles with advanced table/flowchart extraction. MinerU and Docling are experimental alternatives—MinerU converts PDFs to machine-readable formats, and Docling is designed for generative AI document processing. Both are less tested for complex extraction tasks, so their reliability for tables and flowcharts isn't guaranteed [source].

There aren't any published benchmarks or direct performance comparisons between DeepDoc, MinerU, and Docling for table or flowchart parsing in the codebase. The guidance is descriptive, so if you need quantitative results, it's worth monitoring for future updates or community feedback.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

Yeon1007 Nov 24, 2025
Author

How can I enable the Knowledge Graph and RAPTOR features? After processing the data, it still shows "not generated." Also, is it now only can set a maximum of 5 entity types for the Knowledge Graph?@dosu

dosubot[bot] bot Nov 24, 2025

To enable the Knowledge Graph or RAPTOR features, go to your dataset's Configuration page and turn on the relevant options—set entity types (required) and other parameters for Knowledge Graph, or adjust the RAPTOR settings as needed. After saving, you must manually trigger generation: go to the Files page, click the Generate button, and select either Knowledge Graph or RAPTOR. The status will change from "Not generated" to a timestamp once the process completes. If it still shows "not generated," the generation process likely hasn't started, is incomplete, or was deleted—make sure to trigger it and check for any resource or backend issues if it doesn't progress [docs: Knowledge Graph setup] [docs: RAPTOR setup].

There is no hard-coded maximum of 5 entity types for the Knowledge Graph—the default config lists 5, but you can add more as needed in the configuration UI or API. The only requirement is that the list is not empty when enabling the feature [source] [source].

If you change entity types or settings, you must regenerate the Knowledge Graph for updates to take effect. Both features require significant resources, so if generation fails or is slow, check your system's memory and compute availability.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

DeepDoc Can't Parse the Table and FlowChart #11473

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

DeepDoc Can't Parse the Table and FlowChart #11473

Uh oh!

Yeon1007 Nov 24, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Nov 24, 2025

Uh oh!

Yeon1007 Nov 24, 2025 Author

Uh oh!

dosubot[bot] bot Nov 24, 2025

Yeon1007
Nov 24, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Nov 24, 2025

Yeon1007 Nov 24, 2025
Author