Sitemap
Press enter or click to view image in full size

LLMs Can’t Calculate: Why You Should Use Tools for Math

6 min readApr 12, 2025

Large Language Models like GPT have amazed the world with their ability to write, code, and reason. But ask to divide 51234 by 6, and you might be surprised by the answer — because it’s likely wrong.

Why? Because LLMs don’t actually calculate anything. They’re not calculators or symbolic math engines. They’re probabilistic models that predict the next token in a sequence based on massive text data.

Press enter or click to view image in full size

This blog post explores why LLMs are bad at math — and how you can use LangChain’s “PythonREPL” tool to make them great at it. I'll walk through real examples, show you how to set it up, and explain how this simple tool transforms LLMs into powerful, math-savvy agents.

Why LLMs Struggle with Math?

Despite their impressive abilities in language, summarization, and even basic reasoning, Large Language Models (LLMs) have a fundamental weakness: they’re not good at math.

And the reason is simple, but often misunderstood: LLMs don’t actually calculate — they predict.

At their core, LLMs are trained to do one thing: predict the next token in a sequence, based on the context of previous tokens. When you ask a question like:

“Divide 51234 by 6”

The model doesn’t “compute” this like a calculator. Instead, it looks at similar questions in its training data and guesses what the answer should be based on statistical patterns it has learned.

Sometimes, it gets it right, sometime not.

LLMs Need Tools to Handle Math Reliably

This is why many modern LLM architectures are now integrating external tools — like Python interpreters, Wolfram Alpha, or code execution environments — to offload the actual computation.

One such tool is LangChain’s PythonREPL, which allows the model to delegate the math to Python, get the correct result, and then present it as part of its response.

In the next section, we’ll look at how PythonREPL works and how it can be seamlessly integrated into your AI-powered workflows.

What Is LangChain’s PythonREPL?

LangChain is an open-source framework that enables LLMs to interact with external tools like Python interpreters, databases, search engines, and more. One such tool is the PythonREPL which gives LLMs access to a real-time Python execution environment.

PythonREPL is a LangChain tool that acts as a bridge between the LLM and the Python runtime. It allows the language model to:

  • Write Python code as part of its reasoning.
  • Execute that code in a sandboxed environment
  • Use the output to continue or revise its answer

In short, it transforms your LLM from a token guesser into a code-powered problem solver.

Setting Up LangChain with PythonREPL and Ollama (Internal LLM)

Refer my previous blog for how to setup internal LLM and expose API.

Step 1:Install Dependencies

First, make sure you have the necessary Python packages installed:

pip install langchain langchain_experimental langchain-core

Step 2: Load Your Internal LLM (via Ollama)

Join Medium for free to get updates from this writer.

Step 3: Add the PythonREPL Tool


python_repl = PythonREPL()


repl_tool = Tool(
name="python_repl",
description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
func=python_repl.run,
)

Step 4: Create a Math-Agent

# 4. Create agent
agent = initialize_agent(
tools=[repl_tool],
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True #(Make it false,if you dont want to see thought and action.)

Now our math Agent is ready ,lets us run whole code and see the response.

Press enter or click to view image in full size

I gave the same task “Divide 51234 by 6” but this time, using the LangChain framework. The LLM delegated the task to our Math Agent, which in turn executed the calculation using the PythonREPL tool and returned the correct answer.

Press enter or click to view image in full size
https://python.langchain.com/docs/integrations/tools/python/

https://python.langchain.com/docs/security/.

Here’s step by step breakdown of how the flow works

  1. Input: You ask a math-related question.
  2. Agent Reasoning: The LLM decides it needs to use the PythonREPL tool.
  3. Tool Execution: It generates Python code and passes it to the REPL.
  4. Output Retrieval: The result is fetched and returned to the agent.
  5. Final Answer: The agent uses the result to respond accurately.

Note: if you enable verbose=’True’ you may observe multiple iterations of the calculation process. This occurs because LangChain’s agent with AgentType.ZERO_SHOT_REACT_DESCRIPTION uses an internal reasoning loop that looks like this

Thought → Action → Observation → (repeat) → Final Answer

For more details Refer to the Paper ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629.

Full code for your refrence

#Step 1:Import all required libraries

from langchain_community.chat_models import ChatOllama
from langchain_core.tools import Tool
from langchain_experimental.utilities import PythonREPL
from langchain.agents import initialize_agent
from langchain.agents.agent_types import AgentType

#Step 2:
# Define your remote Ollama server URL
OLLAMA_API_BASE = "http://xx.1xx.1xx.1xx:11434" # Replace with your actual server IP

# Load LLaMA model hosted on Ollama
llm = ChatOllama(
base_url=OLLAMA_API_BASE, # API endpoint of Ollama server
model="llama3.1:8b",
temperature=0.7
)
# Step 3:Add the PythonREPL Tool
python_repl = PythonREPL()


repl_tool = Tool(
name="python_repl",
description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
func=python_repl.run,
)

# Step 4 Creae a Math Agent
agent = initialize_agent(
tools=[repl_tool],
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)

Conclusion

While Large Language Models (LLMs) are powerful for understanding math problems and suggesting solutions, they are not inherently designed to calculate with precision. Every number they generate is a prediction not the result of actual computation. This can lead to subtle but critical errors, especially in tasks involving:

  • Complex arithmetic
  • Floating-point precision
  • Domain-specific math logic (like logs, matrix ops, or financial formulas)

By integrating LangChain’s PythonREPL tool, we bring the best of both worlds:
The reasoning and natural language understanding of LLMs
The mathematical accuracy of Python’s computation engine

This hybrid approach not only improves the reliability of results but also enhances the transparency of the reasoning process through Thought → Action → Observation chains, as described in the ReAct paper.

It’s important to acknowledge that recent advancements have led to LLMs specifically optimized for math and code.

These models incorporate built-in computation or are fine-tuned on high-quality math datasets often reducing hallucinations and improving symbolic reasoning.

However, even with these advances:

  • No model is perfect at math without grounding in an execution environment.
  • Using tools like LangChain’s PythonREPL or Wolfram Alpha still provides a layer of reliability for critical tasks.

So, while the ecosystem is evolving rapidly, the LLM + PythonREPL combo remains a practical, transparent, and extensible approach for doing math with confidence.

Thanks for reading! If you found it relevant,don’t forget to leave a clap

Reference:

[2103.03874] Measuring Mathematical Problem Solving With the MATH Dataset

[2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

[2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools

ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629.

Manu Madhavan
Manu Madhavan

Written by Manu Madhavan

Professional with over 10 years of expertise | Master's in Data Science. |Skilled in Python, SQL, AI/ML, and Automation, driving impactful business growth.

No responses yet

Unknown user

Write a response