How to use the embedding on a new categorical data

Hello,

I have a general rudimentary question ( sorry in advance).

I have reviewed (not fully) many parts of the codes in here. I'd like to test the proposed embedding on a new data, but am not sure where to begin.

I have a simple 2-column data:  first col is patient id (assume 1M unique patients) second col is ICD10 diag code (assume 10K categories). We have repeated measurements in data, meaning that diagnoses can be repeated within a given patient and across many patients. 

I tested **Multiple Correspondance Analysis** with categorical data from this [link](http://vxy10.github.io/2016/06/10/intro-MCA/), but the results are not very useful. 

Similar to the German States example in the repo, my goal is to perform (unsupervised) dimensionality reduction ( such as the ones you'd see in denoising AE with minimizing reconstruction error).

- Where should I start? Do I need to run one-hot beforehand?
- What funcs should I use after loading my raw data to generate such embedding?

Appreciate any words of wisdom you may be able to share.



 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use the embedding on a new categorical data #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use the embedding on a new categorical data #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions