Published in

Towards Data Science

You have 2 free member-only stories left this month.

Aashish Nair

Apr 12

6 min read

How I Built a Custom GPT-Based Chatbot in Under 10 Minutes with LlamaIndex

Overview and Implementation with Python

Photo by Miguel Á. Padriñán: https://www.pexels.com/photo/two-white-message-balloons-1111368/

Not long ago, I read an article from Jerry Liu that introduced LlamaIndex, an interface that utilizes GPT to synthesize responses to queries using the information provided by the user.

How Unstructured and LlamaIndex can help bring the power of LLM’s to your own data

(co-authored by Jerry Liu, creator of LlamaIndex, and Brian Raymond, CEO of Unstructured)

medium.com

It immediately got my attention, and I knew I had to try it out for myself.

After all, many of the large language models (LLMs) that have taken the world by storm have limited use cases since they aren’t trained with the data available to their users.

Thus, there is still a demand for customizable LLMs that are tailored to the needs of their users. Simply put, businesses need LLMs that perform tasks like text summarization, question and answering (Q&A), and text generation with their own information.

To see if LlamaIndex has the potential to meet this demand, I’ve played around with its features and was genuinely surprised at what I was able to do. I even ended up building a Q&A chatbot in just around 10 minutes!

Here, I walk you through my journey.

What Does LlamaIndex Do?LlamaIndex generates responses to queries by connecting LLMs to the information provided by users.
As detailed in the documentation, the usage of LlamaIndex entails the following steps:
Load in the documents
Parse the documents into nodes (optional)
Construct the index
Build Indices on top of the constructed indices (optional)
Query the index
Essentially, LlamaIndex loads your data into a document object and then converts it into an index. When the index is provided a query, it passes the query to a GPT prompt to synthesize a response. By default, this is done with OpenAI’s text-davinci-003 model.
While this process sounds pretty convoluted, it can be executed with very little code, as you are about to find out.

Setup

To test the versatility of LlamaIndex, I ended up building 3 different chatbots, with each bot being constructed with a different data source. For the sake of brevity, the previously mentioned optional steps (i.e., steps 2 and 4) will be omitted.

First, let’s take care of the prerequisites.

LlamaIndex and OpenAI can be installed from pip using the following commands:

pip install llama-index
pip install openai

Users will also need an API key from OpenAI:

import os

os.environ['OPENAI_API_KEY'] = 'API_KEY'

The project will also need the following imports:

Loading the data

Data can be loaded either manually or through a data loader. For this case, I have loaded 3 types of data:

a local .txt file in which I write about my favorite fruits (accessible in the GitHub repository)
a Wikipedia page on apples
a YouTube video showing a recipe for a vanilla cake

The first index will be created with the local .txt file, which is located in a folder named data. This data will be loaded manually.

The second index will be created using data from the Wikipedia page on apples. It can be loaded with one of LlamaIndex’s data loaders.

The third index will be constructed with the YouTube video showing how to bake a vanilla cake. This data will also be loaded with a data loader.

Construct the Indices

With all the data loaded into document objects, we can construct the index for each chatbot.

Each index can be constructed from the document object with a one-liner.

Surprised? From loading the data to creating the index, the usage of LlamaIndex requires just a few lines of code!

Query the Index

The constructed indices can now generate responses for any given query. Once again, this step can be completed with a one-liner.

Querying the index build with the .txt file

Code Output (Created By Author)

In case you were wondering, this is the right answer.

2. Querying the index built with the Wikipedia page (topic: apples)

Code Output (Creating By Author)

3. Querying the index build with the YouTube video (topic: vanilla cake recipe)

Code Output (Created By Author)

Finally, it’s also worth noting that the indices will only provide answers to queries when they contain the needed context.

Here’s how the same index created with the YouTube video’s data would respond to a completely irrelevant query.

Code Output (Created By Author)

Thankfully, it seems that LlamaIndex has taken measures against hallucination (i.e when a model confidently gives an answer not justified by the given data).

Deploying the Chatbots with a Web App

Finally, we can create a web app in order to share the constructed indices with end users.

To do so, we need to first save the indices using the save_to_disk method.

These indices will be used in a Streamlit app. The underlying code for the entire app is as follows:

In the app, users can select the data source that they wish to base their questions on and write their query in the provided box.

We can see how the indices perform after running the app:

streamlit run app.py

Querying the index built with the .txt file (my favorite fruits):

Querying the index built with the Wikipedia page (apples):

Querying the index built with the Youtube video (vanilla cake recipe):

Pretty cool, right? We’ve built a functioning web app in just 10 minutes!

Final Remarks

So far, I have only implemented the basic functionalities of the LlamaIndex interface. There are many areas that haven’t been explored in this project, such as customizing LLMs and using non-default settings.

For more information, I invite you to visit the documentation.

If you plan on experimenting with this tool yourself, do be wary of the costs that are incurred from the use of OpenAI’s API. This project cost me a mere $1, but that can be attributed to my working with small documents (the pricing is based on the number of tokens you use). If you get too carried away, you may end up picking up a nasty bill.

Finally, all of the source code used to create the Q&A chatbots can be accessed in this Github repository:

GitHub - anair12GitHubting-A-Custom-GPT-Based-Chatbot-With-LlamaIndex

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Thank you for reading!

Python

Data Science

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Matt Chapman

·Mar 24

The Portfolio that Got Me a Data Scientist Job

Spoiler alert: It was surprisingly easy (and free) to make — Getting a Data Scientist job is hard. This isn’t 2015 anymore: it’s not enough to know a few pandas functions and put the words “Big Data” on your résumé. Competition for the top jobs is fierce. On a recent trawl through the LinkedIn jobs board, I struggled to find a…

Data Science

10 min read

The Portfolio that Got Me a Data Scientist Job

Data Science

10 min read

Share your ideas with millions of readers.

Write on Medium

Thuwarakesh Murallie

·Mar 13

5 Python Decorators I Use in Almost All My Data Science Projects

Decorators provide a new and convenient way for everything from caching to sending notifications — At first, every developer’s goal is to get things working. Slowly, we worry about readability and scalability. This is when we first start thinking about decorators. Decorators are an excellent way to give additional behavior to a function. And there are little things we data scientists often need to inject…

Data Science

6 min read

Data Science

6 min read

Barr Moses

·Apr 3

Zero-ETL, ChatGPT, And The Future of Data Engineering

The post-modern data stack is coming. Are we ready? — If you don’t like change, data engineering is not for you. Little in this space has escaped reinvention. The most prominent, recent examples are Snowflake and Databricks disrupting the concept of the database and ushering in the modern data stack era. As part of this movement, Fivetran and dbt fundamentally…

Data Engineering

9 min read

Data Engineering

9 min read

Nikos Kafritsas

·Apr 5

Time-Series Forecasting: Deep Learning vs Statistics — Who Wins?

A comprehensive guide on the ultimate dilemma — In recent years, Deep Learning has made remarkable progress in the field of NLP. Time series, also sequential in nature, raise the question: what happens if we bring the full power of pretrained transformers to time-series forecasting? However, some papers, such as [2] and [3] have scrutinized Deep Learning models…

Time Series Forecasting

14 min read

Time Series Forecasting

14 min read

Marie Truong

·Jan 5

Can ChatGPT Write Better SQL than a Data Analyst?

I tried ChatGPT, a variant of the GPT-3 language model that is specifically designed for generating human-like text in a conversational context. And of course, like most of us, I wondered: can an AI do my job? And can it do it better than me? I have 2 years of…

ChatGPT

6 min read

ChatGPT

6 min read

How I Built a Custom GPT-Based Chatbot in Under 10 Minutes with LlamaIndex

Overview and Implementation with Python

How Unstructured and LlamaIndex can help bring the power of LLM’s to your own data

(co-authored by Jerry Liu, creator of LlamaIndex, and Brian Raymond, CEO of Unstructured)

What Does LlamaIndex Do?

Setup

Loading the data

Construct the Indices

Query the Index

Deploying the Chatbots with a Web App

Final Remarks

GitHub - anair12GitHubting-A-Custom-GPT-Based-Chatbot-With-LlamaIndex

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

Sign up for The Variable

By Towards Data Science

More from Towards Data Science

The Portfolio that Got Me a Data Scientist Job

5 Python Decorators I Use in Almost All My Data Science Projects

Zero-ETL, ChatGPT, And The Future of Data Engineering

Time-Series Forecasting: Deep Learning vs Statistics — Who Wins?

Can ChatGPT Write Better SQL than a Data Analyst?

Get the Medium app

Aashish Nair

More from Medium

How To Build Your Own Custom ChatGPT Bot

Stop doing this on ChatGPT and get ahead of the 99% of its users

HuggingGPT is a Messy, Beautiful Stumble Towards Artificial General Intelligence

How To Build Your Own Custom ChatGPT With Custom Knowledge Base