Pricing

Simple and flexible. Only pay for what you use.
Get started Contact sales

Base models

Ada 
Fastest
$0.0008 / 1K tokens
Babbage 
$0.0012 / 1K tokens
Curie 
$0.0060 / 1K tokens
Davinci 
Most powerful
$0.0600 / 1K tokens

Multiple models, each with different capabilities and price points. Ada is the fastest model, while Davinci is the most powerful.

Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

flagStart for free

Start experimenting with $18 in free credit that can be used during your first 3 months.

barupPay as you go

To keep things simple and flexible, pay only for the resources you use.

checkChoose your model

Use the right model for the job. We offer a spectrum of capabilities and price points.

Fine-tuned models

Create your own custom models by fine-tuning our base models with your training data. Tokens used to train a model are billed at 50% of our base prices. Once you fine-tune a model, you’ll be billed only for the tokens you use in requests to that model.

Learn more
Model Training Usage
Ada $0.0004 / 1K tokens $0.0016 / 1K tokens
Babbage $0.0006 / 1K tokens $0.0024 / 1K tokens
Curie $0.0030 / 1K tokens $0.0120 / 1K tokens
Davinci $0.0300 / 1K tokens $0.1200 / 1K tokens

Embedding models

Build advanced search, clustering, topic modeling, and classification functionality with our embeddings offering.

Model Usage
Ada $0.0080 / 1K tokens
Babbage $0.0120 / 1K tokens
Curie $0.0600 / 1K tokens
Davinci $0.6000 / 1K tokens

Usage quotas

Because this technology is new, we also want to make sure that rollouts are done responsibly. When you sign up, you’ll be granted an initial spend limit, or quota, and we’ll increase that limit over time as you build a track record with your application.

If you need more tokens, you can always request a quota increase. When you’re ready to go live, you’ll submit a Pre-launch Review Request which will also cover any additional quota increase requests.

Frequently Asked Questions


You can think of tokens as pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

To learn more about how tokens work and estimate your usage…

  • Experiment with our interactive Tokenizer tool.
  • Log in to your account and enter text into the Playground. The counter in the footer will display how many tokens are in your text.

While Davinci is generally the most capable model, the other models can perform certain tasks extremely well and, in some cases, significantly faster. They also have cost advantages. For example, Curie can perform many of the same tasks as Davinci, but faster and for 1/10th the cost. We encourage developers to experiment to find the model that’s most efficient for your application. Visit our documentation for a more detailed model comparison.


Log in to your account to view your usage tracking dashboard. This page will show you how many tokens you’ve used during the current and past billing cycles.


You can configure a usage hard limit in your billing settings, after which we’ll stop serving your requests. You may also configure a soft limit to receive an email alert once you pass a certain usage threshold.


Yes, we treat Playground usage the same as regular API usage.


Completions requests are billed based on the number of tokens sent in your prompt plus the number of tokens in the completion(s) returned by the API.

The best_of and n parameters may also impact costs. Because these parameters generate multiple completions per prompt, they act as multipliers on the number of tokens returned.

Your request may use up to num_tokens(prompt) + max_tokens * max(n, best_of) tokens, which will be billed at the per-engine rates outlined at the top of this page.

In the simplest case, if your prompt contains 10 tokens and you request a single 90 token completion from the davinci engine, your request will use 100 tokens and will cost $0.006.

You can limit costs by reducing prompt length or maximum response length, limiting usage of best_of/n, adding appropriate stop sequences, or using engines with lower per-token costs.


There are two components to fine-tuning pricing: training and usage.

When training a fine-tuned model, the total tokens used will be billed according to our training rates (50% of our base model rates). Note that the number of training tokens depends on the number of tokens in your training dataset and your chosen number of training epochs. The default number of epochs is 4.

(Tokens in your training file * Number of training epochs) = Total training tokens


Once you fine-tune a model, you’ll be billed only for the tokens you use. Requests sent to fine-tuned models are billed at our usage rates.


Classifications requests are billed based on the number of tokens in the inputs you provide. Internally this endpoint makes calls to the search and completions endpoints, so its costs are a function of the costs of those endpoints.

The actual cost per token is based upon which models you select to perform both the search and the completion, which are controlled by the search_model and model parameters respectively.

You may provide a file containing the examples to search over, or you can explicitly specify examples in your request. Providing a file makes search faster and more cost effective when the number of examples you’d like to search over is greater than max_examples. In this scenario, costs are largely based on the number of examples reranked (controlled by max_examples) and the total length of those examples. If you pass examples in your request instead, costs are based on the total length of all those examples.

The length of the query passed into the model as well as the final classification label that is generated will also factor into costs.

You can use the return_prompt debugging flag to understand the length of the final combined prompt that will be sent to the completions endpoint to generate the classification label.


Search requests are billed based on the total number of tokens in the documents you provide, plus the tokens in the query and the tokens needed to instruct the model on how to perform the operation. The API also uses a reference document to generate a response, adding 1 to the total document count. These tokens are billed at the per-engine rates outlined at the top of this page.

You may provide a file containing the documents to search over, or you can explicitly specify documents in your request. Providing a file makes search faster and more cost effective when the number of documents you’d like to search over is greater than max_rerank. In this scenario, costs are largely based on the number of documents reranked (controlled by max_rerank) and the total length of those documents. If you pass documents in your request instead, costs are based on the total length of all those documents.

Below you’ll find the formula for calculating overall token consumption. The 14 represents the additional tokens the API uses per document to accomplish the Semantic Search task, and the added 1 is a reference document:

Number of tokens in all of your documents
+ (Number of documents + 1) * 14
+ (Number of documents + 1) * Number of tokens in your query

= Total tokens

As an example, if you had 5 documents (plus one added by the API) with token lengths of 12, 34, 22, 33, 78 (179 total) and your query was 8 tokens, the total tokens consumed would be: 179 + (6 * 14) + (6 * 8) = 311

You may use the Search Token Estimator or see the code from the Python Estimator to further understand search token usage.


Answers requests are billed based on the number of tokens in the inputs you provide and the answer that the model generates. Internally, this endpoint makes calls to the Search and Completions APIs, so its costs are a function of the costs of those endpoints.

The actual cost per token is based upon which models you select to perform both the search and the completion, which are controlled by the search_model and model parameters respectively.

You may provide a file containing the documents to search over, or you can explicitly specify documents in your request. Providing a file makes search faster and more cost effective when the number of documents you’d like to search over is greater than max_rerank. In this scenario, costs are largely based on the number of documents reranked (controlled by max_rerank) and the total length of those documents. If you pass documents in your request instead, costs are based on the total length of all those documents.

The length of examples, examples_context, question and the length of the generated answer (controlled by max_tokens/stop) will also impact costs.

You can use the return_prompt debugging flag to understand the length of the final combined prompt that will be sent to the completions endpoint to generate the answer.


We will be publishing an SLA soon. In the meantime you can visit our Status page to monitor service availability and view historical uptime. If your company or application has specific requirements, please contact our sales team.


Yes. Azure customers can access the OpenAI API on Azure with the compliance, regional support, and enterprise-grade security that Azure offers. Learn more or contact sales@openai.com.