Skip to content

More elaborate description how to build custom Docker images for Presidio #1663

@bfisseler

Description

@bfisseler

Is your feature request related to a problem? Please describe.
I am using the Docker images of Presidio to implement PII analysis and removal in my R workflow. The data I am working with comes in languages different from English but English is the only language supported by the official Presidio Docker images. So I am buidling my own custom images but would like a more detailed instruction on where to modify the existing yaml files.

Describe the solution you'd like
There is a more or less extensive documentation on Presidio and how to use it directly in Python. And while there is also some documentation on how to run Presidio using Docker, there is little to none documentation on how to modify the yaml files to build custom Docker images.

I would like one or two pages on

  • which yaml files to modify in order to customize Presidio to support additional languages
  • typical pitfalls to avoid, e.g. I had no success adding 10+ languages at once, as the Docker image ran out of memory(?)
  • I also get some warnings when running my customized Docker images that I don't know how to deal with, e.g. UserWarning: NLP recognizer (e.g. SpacyRecognizer, StanzaRecognizer) is not in the list of recognizers for language en.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions