Feature Engineering — deep dive into Encoding and Binning techniques

Illustration of feature encoding and feature binning techniques

5 min readAug 26, 2020

Feature engineering is the most important aspect of a data science model development. There are several categories of features in a raw dataset. Features can be text, date/time, categorical, and continuous variables. For a machine learning model, the dataset needs to be processed in the form of numerical vectors to train it using an ML algorithm.

The objective of this article is to demonstrate feature engineering techniques to transform a categorical feature into a continuous feature and vice-versa.

Feature Binning: Conversion of a continuous variable to categorical.
Feature Encoding: Conversion of a categorical variable to numerical features.

Feature Binning:

Binning or discretization is used for the transformation of a continuous or numerical variable into a categorical feature. Binning of continuous variable introduces non-linearity and tends to improve the performance of the model. It can be also used to identify missing values or outliers.

There are two types of binning:

Unsupervised Binning: Equal width binning, Equal frequency…

TDS Archive

Feature Engineering — deep dive into Encoding and Binning techniques

Illustration of feature encoding and feature binning techniques

Feature Binning:

Create an account to read the full story.

Published in TDS Archive

Written by Satyam Kumar

No responses yet

More from Satyam Kumar and TDS Archive

5 Changepoint Detection algorithms every Data Scientist should know

Essential guide to changepoint detection algorithms for time series analysis

Understanding LLMs from Scratch Using Middle School Math

In this article, we talk about how LLMs work, from scratch — assuming only that you know how to add and multiply two numbers. The article…

10 Common Software Architectural Patterns in a nutshell

Ever wondered how large enterprise scale systems are designed? Before major software development starts, we have to choose a suitable…

10 automated EDA libraries in one place

Implementation of Exploratory Data Analysis libraries with a few lines of Python code

Recommended from Medium

10 ML Algorithms Every Data Scientist Should Know — Part 1

I understand well that machine learning might sound intimidating. But once you break down the common algorithms, you’ll see they’re not.

The AI Bubble Is About To Burst, But The Next Bubble Is Already Growing

Techbros are preparing their latest bandwagon.

Human-Machine Collaboration with Bayesian Modeling: Learn To Combine Knowledge With Data.

The starter’s guide to combining human expert knowledge with computer-aided models

I’ll Instantly Know You Used Chat Gpt If I See This

Trust me you’re not as slick as you think

I have built around 300 agents, worked at 5 startups. Here’s what I learnt about AI Agent

Lessons learnt after working with agents for over an year.

Leave-One-Out Cross-Validation Explained

One out, all in.