-
Notifications
You must be signed in to change notification settings - Fork 864
Description
Is your feature request related to a problem? Please describe.
I am using the Docker images of Presidio to implement PII analysis and removal in my R workflow. The data I am working with comes in languages different from English but English is the only language supported by the official Presidio Docker images. So I am buidling my own custom images but would like a more detailed instruction on where to modify the existing yaml files.
Describe the solution you'd like
There is a more or less extensive documentation on Presidio and how to use it directly in Python. And while there is also some documentation on how to run Presidio using Docker, there is little to none documentation on how to modify the yaml files to build custom Docker images.
I would like one or two pages on
- which yaml files to modify in order to customize Presidio to support additional languages
- typical pitfalls to avoid, e.g. I had no success adding 10+ languages at once, as the Docker image ran out of memory(?)
- I also get some warnings when running my customized Docker images that I don't know how to deal with, e.g.
UserWarning: NLP recognizer (e.g. SpacyRecognizer, StanzaRecognizer) is not in the list of recognizers for language en.