-
Notifications
You must be signed in to change notification settings - Fork 323
Description
Hello,
I have a general rudimentary question ( sorry in advance).
I have reviewed (not fully) many parts of the codes in here. I'd like to test the proposed embedding on a new data, but am not sure where to begin.
I have a simple 2-column data: first col is patient id (assume 1M unique patients) second col is ICD10 diag code (assume 10K categories). We have repeated measurements in data, meaning that diagnoses can be repeated within a given patient and across many patients.
I tested Multiple Correspondance Analysis with categorical data from this link, but the results are not very useful.
Similar to the German States example in the repo, my goal is to perform (unsupervised) dimensionality reduction ( such as the ones you'd see in denoising AE with minimizing reconstruction error).
- Where should I start? Do I need to run one-hot beforehand?
- What funcs should I use after loading my raw data to generate such embedding?
Appreciate any words of wisdom you may be able to share.