TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

PCA clearly explained —When, Why, How to use it and feature importance: A guide in Python

7 min readMay 31, 2020

Handmade sketch made by the author.

1. Introduction & Background

Principal Components Analysis (PCA) is a well-known unsupervised dimensionality reduction technique that constructs relevant features/variables through linear (linear PCA) or non-linear (kernel PCA) combinations of the original variables (features). In this post, we will only focus on the famous and widely used linear PCA method.

The construction of relevant features is achieved by linearly transforming correlated variables into a smaller number of uncorrelated variables. This is done by projecting (dot product) the original data into the reduced PCA space using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The resulting projected data are essentially linear combinations of the original data capturing most of the variance in the data (Jolliffe 2002).

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Serafeim Loukas, PhD
Serafeim Loukas, PhD

Written by Serafeim Loukas, PhD

Data Scientist @ IATA (Switzerland). PhD, MSc, M.Eng. Bespoke services on demand

Responses (7)

Write a response

thank you Serafeim, very useful explanation!! i use PCA for combining several variables into the Sustainable Territorial Development Index and are a powerful and reliable tool!! your explanation is quite welcome under the additional understanding…

2

I find this about Principle Component Analysis to be detailed and well-explained.

1

Hey, how did you get the colours to display for each class? Are the classes named by colour? I imported an iris dataset and the classes were called 'Iris-setosa', 'Iris-versicolor' etc. Are your y values colours?

1