Sitemap

TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Feature Engineering — deep dive into Encoding and Binning techniques

Illustration of feature encoding and feature binning techniques

5 min readAug 26, 2020
Press enter or click to view image in full size
Image by Author

Feature engineering is the most important aspect of a data science model development. There are several categories of features in a raw dataset. Features can be text, date/time, categorical, and continuous variables. For a machine learning model, the dataset needs to be processed in the form of numerical vectors to train it using an ML algorithm.

The objective of this article is to demonstrate feature engineering techniques to transform a categorical feature into a continuous feature and vice-versa.

  • Feature Binning: Conversion of a continuous variable to categorical.
  • Feature Encoding: Conversion of a categorical variable to numerical features.

Feature Binning:

Binning or discretization is used for the transformation of a continuous or numerical variable into a categorical feature. Binning of continuous variable introduces non-linearity and tends to improve the performance of the model. It can be also used to identify missing values or outliers.

There are two types of binning:

  • Unsupervised Binning: Equal width binning, Equal frequency…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web
Already have an account? Sign in
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

No responses yet

Write a response