Skip to content

Feature Request: K-Prototypes or similar implementation for clustering mixed data #118

Description

@plutonium-239

I would love to have an optimised (or at least CUDA) implementation of the K-Prototypes algorithm (package that I use: kmodes, since a lot of data science deals with categorical data, and it would be great if I don't have to use TargetEncoders or worse, pd.get_dummies() for categorical data with a lot of categories.
Right now, the solution that I use is using a TargetEncoder on the categorical variables and then using the kmeans/knn in this package, which I feel is a little 'fix'-ey, because of numerical data being continuous and having some relations, whereas it is not necessary for the categorical variables to have any relations (greater than/less than)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions