Who Will Win IPL 2025? Using Machine Learning to Predict the Season 18 Champion

6 min read11 hours ago

After my previous attempt 2 years ago to predict the champions of S16, I’m back again with another ML project to predict the winners this year. I got some great engagement on LinkedIn and here last time, with the inputs I tried to implement in this project. Unlike last time, I attempted earlier in this season with 19/74 matches completed. If you’re curious about my previous attempt, more details here.

So, if you’re ready, let's get started with the journey to predict who will be crowned champions this year. But before that,

Disclaimer: Please note that you should not use these results to place bets. I created this as a simple mathematical exercise to better grasp the capabilities of ML and my passion for the game.

The standard machine learning workflow contains :

Where is the code?

The notebook for the project can be found here. I also uploaded the newly created dataset to Kaggle. Have a look at the notebook for an in-depth analysis of the project

Data Collection

The dataset here was used, which has records of all the matches from 2008–2024. The data for 2025 season was recorded manually from the internet, by recording completed match data from Cricbuzz, and a new prediction dataset was prepared.

Here is the snapshot of the data

2. Data Exploration and Cleaning

The dataset had high null values for a few of the columns. I went ahead with addressing these nulls and also eliminating columns that are not relevant for our analysis. (More details in the notebook)

We also had old names of the franchises that had to be updated to make them consistent across the dataset.

3. Data Pre-processing

To make the dataset more meaningful and help the model learn the patterns efficiently, I created a ‘Team Rank’ column that ranks the IPL teams just like the ICC ranking, but based on their win percentage over the years. I included these data in my dataset.

I concatenated the S18 records till Match-19 for the main dataset and had the new matches data ready for predicting game winners

Once I have the dataset curated and ready, it's time to encode the data to make it machine-readable. To do that, since most of my data has categorical values, I used Label encoding to help the model understand the data

Once the data is encoded, it's time to feed the data to the model to train it.

4. Model development

For my use case for a classification task and with a dataset with categorical values with complex relationships, tree-based models such as Decision Tree, Random Forest, and Gradient Boosting are good choices. I also tried training my data for KNN and Logistic regression models.

After training and testing multiple models, XGBoost, which handles null values well, gave better performance with an accuracy of 74%

Hyper-parameter tuning

Hyperparameter tuning is the process of selecting the optimal set of hyperparameters for a machine learning algorithm to improve its performance on a given dataset. Hyperparameters are parameters that are not learned during the training process but instead are set before training begins

I used RandomSearchCV to find the best hyperparameters for a machine learning model. It helps to find the best parameters by:

Randomly sampling combinations of parameters from a defined grid
Evaluating them using cross-validation
Returning the best-performing configuration

5. Prediction on the testing set

Now comes the part that you’ve been waiting for: to get prediction results for the upcoming matches (from Game 20) . I encoded my testing data with the same encoders used in training to keep consistency in the encoders of the team names. There were challenges here to maintain consistency across the train & test data and also avoid data leakage by not fitting the full data to my model.

Once I have the test data, its time to ask my best fitted model to predict the results and voila, by that I have prediction results for all the upcoming matches in IPL-2025!

By doing this, I have predicted the results for all the upcoming matches. I took these results to prepare the final points table at the end of the season.

predicted points table at the end of the season

In IPL, the top 4 from the league stage advance to the knockouts and the teams here are: DC, GT, RCB and PK ( Ironically, 3 teams that haven't won the title before! )

The knockout stage in IPL follows the fixtures below.

I used the results from the above by asking the model to predict the winners of the games in the above order. The same steps were performed to encode and predict the winner of the game.

According to this model, the Gujarat Titans are likely to win IPL 2025

Learnings:

This project gave me good exposure to wrangling unclean data to dig into the function for 30 minutes, just to identify a whitespace in a column that was the root cause
It refreshed my understanding of tree-based models like Decision tree, random Forest, along with topics of entropy and information gain
This was my first time implementing Gradient Boost and XGBoost

Future enhancements and limitations:

Model accuracy can be improved by adding more data while training our model. (Eg: Head-to-Head stats of the teams, venue performance stats etc)
There is a chance of Label encoding creating artificial encoding on categorical data (eg: Csk->1, Mi->3 ), which might confuse the tree-based models
This model doesn't take into consider of NRR factor which will hold a major value in determining knockout contenders.

If you have any other inputs/suggestions, please feel free to hit me up

Again, thanks for reading till here!

Aditya

References:

Who Will Win IPL 2025? Using Machine Learning to Predict the Season 18 Champion

According to this model, the Gujarat Titans are likely to win IPL 2025

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Aditya Bharadwaj

No responses yet

More from Aditya Bharadwaj

Interview experience: Data engineering Co-op Spring 24

Although I was actively applying/ Interviewing for the last 15 months, I remember applying to this role sometime in Oct 2023 through their…

Who will win IPL 2023??

Curious to know which team will win IPL 2023? Lets dive in

Recap of DE Summit 2024

DE Summit took place as part of ODSC- East 2024 on Apr 23–25 in Boston. In this article, I try to summarize the wonderful sessions by…

Recommended from Medium

Teach Your GBM to Extrapolate with Model Stacking

Background

Satellite Image Segmentation Using Vision Mamba UNet

Mamba snakes are highly venomous and fast-moving reptiles. Encountering a black mamba would be far from a dream, as a single bite can be…

Preparing for a Data Scientist Interview in 2025: A Comprehensive Guide

Data Science interviews have evolved to cover a broad range of skills and scenarios. As a data scientist in 2025, you’re expected to be a…

Drawing and Coding 18 RL Algorithms from Scratch

PPO, A3C, PlaNet and more!

Manus AI: The Autonomous Agent Redefining Artificial Intelligence

Innovation is the key differentiator in the rapidly evolving landscape of artificial intelligence (AI). While tech giants like OpenAI…

You Can Make Money With AI Without Quitting Your Job

I’m doing it, 2 hours a day