Skip to content

How to use the embedding on a new categorical data #24

@isaac2lord

Description

@isaac2lord

Hello,

I have a general rudimentary question ( sorry in advance).

I have reviewed (not fully) many parts of the codes in here. I'd like to test the proposed embedding on a new data, but am not sure where to begin.

I have a simple 2-column data: first col is patient id (assume 1M unique patients) second col is ICD10 diag code (assume 10K categories). We have repeated measurements in data, meaning that diagnoses can be repeated within a given patient and across many patients.

I tested Multiple Correspondance Analysis with categorical data from this link, but the results are not very useful.

Similar to the German States example in the repo, my goal is to perform (unsupervised) dimensionality reduction ( such as the ones you'd see in denoising AE with minimizing reconstruction error).

  • Where should I start? Do I need to run one-hot beforehand?
  • What funcs should I use after loading my raw data to generate such embedding?

Appreciate any words of wisdom you may be able to share.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions