Member-only story
PCA clearly explained —When, Why, How to use it and feature importance: A guide in Python
In this post I explain what PCA is, when and why to use it and how to implement it in Python using scikit-learn. Also, I explain how to get the feature importance after a PCA analysis.
1. Introduction & Background
Principal Components Analysis (PCA) is a well-known unsupervised dimensionality reduction technique that constructs relevant features/variables through linear (linear PCA) or non-linear (kernel PCA) combinations of the original variables (features). In this post, we will only focus on the famous and widely used linear PCA method.
The construction of relevant features is achieved by linearly transforming correlated variables into a smaller number of uncorrelated variables. This is done by projecting (dot product) the original data into the reduced PCA space using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).
The resulting projected data are essentially linear combinations of the original data capturing most of the variance in the data (Jolliffe 2002).