Gemini API

The Vertex AI Gemini API supports multimodal prompts as input and ouputs text or code.

POST https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models/gemini-pro:streamGenerateContent
POST https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models/gemini-pro-vision:streamGenerateContent

The following regions are supported:

  • Iowa (us-central1)
  • Las Vegas, Nevada (us-west4)
  • Montréal, Canada (northamerica-northeast1)
  • Northern Virginia (us-east4)
  • Oregon (us-west1)
  • Seoul, Korea (asia-northeast3)
  • Singapore (asia-southeast1)
  • Tokyo, Japan (asia-northeast1)

To use the latest model version, specify the model name without a version number, for example gemini-pro or gemini-pro-vision.

For more information, see Model versions and lifecycle.

The request body contains data with the following structure:

{
 
"contents": [
   
{
     
{
       
"role": string,
       
"parts": [
         
{
           
// Union field data can be only one of the following:
           
"text": string,
           
"inlineData": {
             
"mimeType": string,
             
"data": string
           
},
           
"fileData": {
             
"mimeType": string,
             
"fileUri": string
           
},
           
// End of list of possible types for union field data.

           
"videoMetadata": {
             
"startOffset": {
               
"seconds": integer,
               
"nanos": integer
             
},
             
"endOffset": {
               
"seconds": integer,
               
"nanos": integer
             
}
           
}
         
}
       
]
     
}
   
}
 
],
 
"tools": [
   
{
     
"functionDeclarations": [
       
{
         
"name": string,
         
"description": string,
         
"parameters": {
            object
(OpenAPI Object Schema)
         
}
       
}
     
]
   
}
 
],
 
"safetySettings": [
   
{
     
"category": enum (HarmCategory),
     
"threshold": enum (HarmBlockThreshold)
   
}
 
],
 
"generationConfig": {
   
"temperature": number,
   
"topP": number,
   
"topK": number,
   
"candidateCount": integer,
   
"maxOutputTokens": integer,
   
"stopSequences": [
     
string
   
]
 
}
}

Use the following parameters:

Parameter Description
role The role in a conversation associated with the content. Specifying a role is required even in singleturn use cases. Acceptable values include the following:
  • USER: Specifies content that's sent by you.
  • MODEL: Specifies the model's response.
parts Ordered parts that make up the input. Parts may have different MIME types.

For gemini-pro, only the text field is valid. The token limit is 32k.

For gemini-pro-vision, you may specify either text only, text and up to 16 images, or text and 1 video. The token limit is 16k.
text The text instructions or chat dialogue to include in the prompt.
inlineData Serialized bytes data of the image or video. You can specify at most 1 image with inlineData. To specify up to 16 images, use fileData.
mimeType The media type of the image or video specified in the data or fileUri fields. Acceptable values include the following:
  • image/png
  • image/jpeg
  • image/webp
  • image/heic
  • image/heif
  • video/mov
  • video/mpeg
  • video/mp4
  • video/mpg
  • video/avi
  • video/wmv
  • video/mpegps
  • video/flv


Maximum video length: 2 minutes.

No limit on image resolution.
data The base64 encoding of the image or video to include inline in the prompt. When including media inline, you must also specify MIMETYPE.

size limit: 20MB

fileURI The Cloud Storage URI of the image or video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify MIMETYPE.

size limit: 20MB

videoMetadata Optional. For video input, the start and end offset of the video in Duration format. For example, to specify a 10 second clip starting at 1:00, set "start_offset": { "seconds": 60 } and "end_offset": { "seconds": 70 }.
tools A piece of code that enables the system to interact with external systems to perform an action, or set of actions, outside of knowledge and scope of the model.
functionDeclarations One or more function declarations. Each function declaration contains information about one function that includes the following:
  • name The name of the function to call. Must start with a letter or an underscore. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
  • description (optional). The description and purpose of the function. The model uses this to decide how and whether to call the function. For the best results, we recommend that you include a description.
  • parameters The parameters of this function in a format that's compatible with the OpenAPI schema format.

For more information, see Function calling.
category The safety category to configure a threshold for. Acceptable values include the following:
  • HARM_CATEGORY_SEXUALLY_EXPLICIT
  • HARM_CATEGORY_HATE_SPEECH
  • HARM_CATEGORY_HARASSMENT
  • HARM_CATEGORY_DANGEROUS_CONTENT
threshold The threshold for blocking responses that could belong to the specified safety category based on probability.
  • BLOCK_NONE
  • BLOCK_LOW_AND_ABOVE
  • BLOCK_MED_AND_ABOVE
  • BLOCK_HIGH_AND_ABOVE
temperature The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected.

Range: 0.0 - 1.0

Default for gemini-pro: 0.9

Default for gemini-pro-vision: 0.4
maxOutputTokens Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

Specify a lower value for shorter responses and a higher value for potentially longer responses.


Range: 1-2048

Default for gemini-pro: 8192

Default for gemini-pro-vision: 2048
topK Top-K changes how the model selects tokens for output. A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.


Range: 1-40

Default for gemini-pro-vision: 32

Default for gemini-pro: none
topP Top-P changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

Specify a lower value for less random responses and a higher value for more random responses.


Range: 0.0 - 1.0

Default: 1.0
candidateCount The number of response variations to return.

This value must be 1.
stopSequences Specifies a list of strings that tells the model to stop generating text if one of the strings is encountered in the response. If a string appears multiple times in the response, then the response truncates where it's first encountered. The strings are case-sensitive.

For example, if the following is the returned response when stopSequences isn't specified:

public static string reverse(string myString)

Then the returned response with stopSequences set to ["Str", "reverse"] is:

public static string

Maximum 5 items in the list.

Response body

{
 
"candidates": [
   
{
     
"content": {
       
"parts": [
         
{
           
"text": string
         
}
       
]
     
},
     
"finishReason": enum (FinishReason),
     
"safetyRatings": [
       
{
         
"category": enum (HarmCategory),
         
"probability": enum (HarmProbability),
         
"blocked": boolean
       
}
     
],
     
"citationMetadata": {
       
"citations": [
         
{
           
"startIndex": integer,
           
"endIndex": integer,
           
"uri": string,
           
"title": string,
           
"license": string,
           
"publicationDate": {
             
"year": integer,
             
"month": integer,
             
"day": integer
           
}
         
}
       
]
     
}
   
}
 
],
 
"usageMetadata": {
   
"promptTokenCount": integer,
   
"candidatesTokenCount": integer,
   
"totalTokenCount": integer
 
}
}
Response element Description
text The generated text.
finishReason The reason why the model stopped generating tokens. If empty, the model has not stopped generating the tokens.
  • FINISH_REASON_UNSPECIFIED The finish reason is unspecified.
  • FINISH_REASON_STOP Natural stop point of the model or provided stop sequence.
  • FINISH_REASON_MAX_TOKENS The maximum number of tokens as specified in the request was reached.
  • FINISH_REASON_SAFETY The token generation was stopped as the response was flagged for safety reasons. Note that Candidate.content is empty if content filters block the output.
  • FINISH_REASON_RECITATION The token generation was stopped as the response was flagged for unauthorized citations.
  • FINISH_REASON_OTHER All other reasons that stopped the token
category The safety category to configure a threshold for. Acceptable values include the following:
  • HARM_CATEGORY_SEXUALLY_EXPLICIT
  • HARM_CATEGORY_HATE_SPEECH
  • HARM_CATEGORY_HARASSMENT
  • HARM_CATEGORY_DANGEROUS_CONTENT
probability The harm probability levels in the content.
  • HARM_PROBABILITY_UNSPECIFIED
  • NEGLIGIBLE
  • LOW.
  • MEDIUM
  • HIGH
blocked A boolean flag associated with a safety attribute that indicates if the model's input or output was blocked. If blocked is true, then the errors field in the response contains one or more error codes. If blocked is false, then the response doesn't include the errors field.
startIndex An integer that specifies where a citation starts in the content.
endIndex An integer that specifies where a citation ends in the content.
url The URL of a citation source. Examples of a URL source might be a news website or a GitHub repository.
title The title of a citation source. Examples of source titles might be that of a news article or a book.
license The license associated with a citation.
publicationDate The date a citation was published. Its valid formats are YYYY, YYYY-MM, and YYYY-MM-DD.
promptTokenCount Number of tokens in the request.
candidatesTokenCount Number of tokens in the response(s).
totalTokenCount Number of tokens in the request and response(s).

Sample requests

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
For other fields, see the Request body table.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent

Request JSON body:

{
  "contents": {
    "role": "user",
    "parts": {
        "text": "Give me a recipe for banana bread."
    },
  },
  "safety_settings": {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_LOW_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.2,
    "topP": 0.8,
    "topK": 40,
    "maxOutputTokens": 200,
    "stopSequences": [".", "?", "!"]
  }
}

To send your request, choose one of these options:

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent"

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Also see Send chat prompt requests (Gemini).

To test a chat prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
For other fields, see the Request body table.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent

Request JSON body:

{
  "contents": [
    {
      "role": "USER",
      "parts": { "text": "Hello!" }
    },
    {
      "role": "ASSISTANT",
      "parts": { "text": "Argh! What brings ye to my ship?" }
    },
    {
      "role": "USER",
      "parts": { "text": "Wow! You are a real-life priate!" }
    }
  ],
  "safety_settings": {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_LOW_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.2,
    "topP": 0.8,
    "topK": 40,
    "maxOutputTokens": 200,
  }
}

To send your request, choose one of these options:

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent"

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Also see Send multimodal prompt requests.

To test a multimodal prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
For other fields, see the Request body table.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro-vision:streamGenerateContent

Request JSON body:

{
  "contents": {
    "role": "user",
    "parts": [
      {
        "fileData": {
          "mimeType": "image/png",
          "fileUri": "gs://cloud-samples-data/ai-platform/flowers/daisy/10559679065_50d2b16f6d.jpg"
        }
      },
      {
        "text": "Describe this picture."
      }
    ]
  },
  "safety_settings": {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_LOW_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.4,
    "topP": 1.0,
    "topK": 32,
    "maxOutputTokens": 2048
  }
}

To send your request, choose one of these options:

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro-vision:streamGenerateContent"

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro-vision:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Also see Function calling.

To test a function prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
For other fields, see the Request body table.

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent

Request JSON body:

{
  "contents": {
    "role": "user",
    "parts": {
      "text": "Which theaters in Mountain View show Barbie movie?"
    }
  },
  "tools": [
    {
      "function_declarations": [
        {
          "name": "find_movies",
          "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "description": {
                "type": "string",
                "description": "Any kind of description including category or genre, title words, attributes, etc."
              }
            },
            "required": [
              "description"
            ]
          }
        },
        {
          "name": "find_theaters",
          "description": "find theaters based on location and optionally movie title which are is currently playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              }
            },
            "required": [
              "location"
            ]
          }
        },
        {
          "name": "get_showtimes",
          "description": "Find the start times for movies playing in a specific theater",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              },
              "theater": {
                "type": "string",
                "description": "Name of the theater"
              },
              "date": {
                "type": "string",
                "description": "Date for requested showtime"
              }
            },
            "required": [
              "location",
              "movie",
              "theater",
              "date"
            ]
          }
        }
      ]
    }
  ]
}

To send your request, choose one of these options:

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent"

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Sample responses

[{
 
"candidates": [
   
{
     
"content": {
       
"parts": [
         
{
           
"text": "**Ingredients:**\n\n* 2 "
         
}
       
]
     
},
     
"finishReason": "FINISH_REASON_STOP",
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
],
 
"usageMetadata": {
   
"promptTokenCount": 8,
   
"candidatesTokenCount": 8,
   
"totalTokenCount": 16
 
}
}]
[{
 
"candidates": [
   
{
     
"content": {
       
"role": "model",
       
"parts": [
         
{
           
"text": "Avast there, landlubber! Ye be mistaken. I be but a"
         
}
       
]
     
},
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "LOW"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
]
}
,
{
 
"candidates": [
   
{
     
"content": {
       
"role": "model",
       
"parts": [
         
{
           
"text": " humble sailor, livin' the pirate's life for the thrill of it"
         
}
       
]
     
},
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
]
}
,
{
 
"candidates": [
   
{
     
"content": {
       
"role": "model",
       
"parts": [
         
{
           
"text": ". No treasure for me, just the freedom of the open seas and the company of me hearty crew. What brings ye to our shores?"
         
}
       
]
     
},
     
"finishReason": "STOP",
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
],
 
"usageMetadata": {
   
"promptTokenCount": 23,
   
"candidatesTokenCount": 60,
   
"totalTokenCount": 83
 
}
}]
[{
 
"candidates": [
   
{
     
"content": {
       
"role": "model",
       
"parts": [
         
{
           
"text": " A daisy is growing up through a pile of brown and yellow fall leaves"
         
}
       
]
     
},
     
"finishReason": "STOP",
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
],
 
"usageMetadata": {
   
"promptTokenCount": 262,
   
"candidatesTokenCount": 14,
   
"totalTokenCount": 276
 
}
 
}
 
,
 
{
 
"error": {
   
"code": 499,
   
"message": "The operation was cancelled.",
   
"status": "CANCELLED"
 
}
}]
[{
 
"candidates": [
   
{
     
"content": {
       
"parts": [
         
{
           
"functionCall": {
             
"name": "find_theaters",
             
"args": {
               
"movie": "Barbie",
               
"location": "Mountain View, CA"
             
}
           
}
         
}
       
]
     
},
     
"finishReason": "STOP",
     
"safetyRatings": [
       
{
         
"category": "HARM_CATEGORY_HARASSMENT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_HATE_SPEECH",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
         
"probability": "NEGLIGIBLE"
       
},
       
{
         
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
         
"probability": "NEGLIGIBLE"
       
}
     
]
   
}
 
],
 
"usageMetadata": {
   
"promptTokenCount": 9,
   
"totalTokenCount": 9
 
}
}]