PCA for dimensionality reduction combined with Kmeans
Goal
The Goal of this notebook is to apply a dimensionality reduction on a big dataset in order to remove noise and then to apply the kmenas algorithm to divide the songs in clusters (music genres) and try to understand the results using pivotal tables.
Overwiev
- Preparing the dataset
- PCA
- K-means
- Pivotal tables
Content
- main.ipynb is the main notebook
- functions.py contains the kmeans functions
- data is a folder which contains:
- pca.pickle the dataset with the PCA already applied
- tracks.pickle.zip a zipped pickle file of the tracks dataset used for the pivotal table