Member-only story
Feature Engineering — deep dive into Encoding and Binning techniques
Illustration of feature encoding and feature binning techniques
Feature engineering is the most important aspect of a data science model development. There are several categories of features in a raw dataset. Features can be text, date/time, categorical, and continuous variables. For a machine learning model, the dataset needs to be processed in the form of numerical vectors to train it using an ML algorithm.
The objective of this article is to demonstrate feature engineering techniques to transform a categorical feature into a continuous feature and vice-versa.
- Feature Binning: Conversion of a continuous variable to categorical.
- Feature Encoding: Conversion of a categorical variable to numerical features.
Feature Binning:
Binning or discretization is used for the transformation of a continuous or numerical variable into a categorical feature. Binning of continuous variable introduces non-linearity and tends to improve the performance of the model. It can be also used to identify missing values or outliers.
There are two types of binning:
- Unsupervised Binning: Equal width binning, Equal frequency…