When running the pipeline alone to attempt to parse a file, the parser gets stuck when using deepdoc. and how to use different parsers #11362

Yeon1007 · 2025-11-19T03:52:16Z

Yeon1007
Nov 19, 2025

I created a pipeline in the agent to try to chunk PDF files. When I selected deepdoc as the parser, the progress in the log got stuck at 10%, and the total CPU usage of Docker became 430%/0% (16 CPUs available). What could be causing this? Using DeepDoc shouldn't require any other prerequisites. And I would also like to ask about the prerequisites for using MinerU and Docling. Do I need to install them using pip or modify the Dockerfile first? Also, I'd like to ask about the UI for enabling pipeline process monitoring.

Answered by dosubot[bot]

Nov 19, 2025

Hi @imkkk777! I'm Dosu and I’m helping the ragflow team.

The DeepDoc parser often gets stuck at 10% with high CPU usage because it runs OCR and layout analysis on CPU by default, which is very resource-intensive—especially for image-heavy or complex PDFs. GPU acceleration is not used for DeepDoc parsing, even if you set GPU options; it's only used for embedding or LLM inference. If you don't need image recognition, switching to the "Naive" parser can significantly reduce resource usage and avoid these stalls. Also, make sure your server has enough RAM and that both the main server and task executor are running—parsing can hang if resources are low or services are unhealthy. Upgrading to …

View full answer

2025-11-19T04:06:54Z

dosubot[bot]
bot Nov 19, 2025

Hi @imkkk777! I'm Dosu and I’m helping the ragflow team.

The DeepDoc parser often gets stuck at 10% with high CPU usage because it runs OCR and layout analysis on CPU by default, which is very resource-intensive—especially for image-heavy or complex PDFs. GPU acceleration is not used for DeepDoc parsing, even if you set GPU options; it's only used for embedding or LLM inference. If you don't need image recognition, switching to the "Naive" parser can significantly reduce resource usage and avoid these stalls. Also, make sure your server has enough RAM and that both the main server and task executor are running—parsing can hang if resources are low or services are unhealthy. Upgrading to the latest nightly build may resolve some known bugs with stuck parsing and parser selection [source].

For MinerU and Docling: both require installation before use. In Docker, set USE_MINERU=true or USE_DOCLING=true in your docker/.env file. The entrypoint script will handle pip installing MinerU and Docling if they're not already present. For MinerU, you can also manually install it in a virtual environment and set MINERU_EXECUTABLE to the correct path if you want more control. After changing these settings, rebuild and restart your containers [source].

To use a different parser, select it in the dataset configuration UI before uploading your PDF. If you change the parser, you must delete and re-upload the PDF for the new parser to take effect—re-parsing does not update the parser for existing files [source].

For pipeline process monitoring, there is a built-in UI log sheet that shows pipeline progress, logs, and lets you cancel or export logs as JSON. This is available in the agent pipeline UI and provides real-time feedback on parsing and chunking steps [source].

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

When running the pipeline alone to attempt to parse a file, the parser gets stuck when using deepdoc. and how to use different parsers #11362

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

When running the pipeline alone to attempt to parse a file, the parser gets stuck when using deepdoc. and how to use different parsers #11362

Uh oh!

Yeon1007 Nov 19, 2025

Replies: 1 comment

Uh oh!

dosubot[bot] bot Nov 19, 2025

Yeon1007
Nov 19, 2025

dosubot[bot]
bot Nov 19, 2025