Vertex AI can log samples of requests and responses for Gemini and supported partner models. The logs are saved to a BigQuery table for viewing and analysis. This page describes how to configure request-response logs for base foundation models and fine-tuned models.
Supported API methods for logging
Request-response logs are supported for all Gemini models that
use generateContent
or
streamGenerateContent
.
The following partner models that use
rawPredict
or
streamrawPredict
are also supported:
- Anthropic Claude
Request-response logs for base foundation models
You can configure request-response logs for base foundation models by using the REST API or Python SDK. Logging configurations can take a few minutes to take effect.
Enable request-response logging
Select one the of the following tabs for instructions on enabling request-response logs for a base foundation model.
For Anthropic models, only REST is supported for logging configuration. Enable
logging configuration through the REST API by setting publisher to anthropic
and setting the model name to one of the
supported Claude models.
Python SDK
This method can be used to create or update a PublisherModelConfig
.
publisher_model = GenerativeModel('gemini-2.0-pro-001')
# Set logging configuration
publisher_model.set_request_response_logging_config(
enabled=True,
sampling_rate=1.0,
bigquery_destination="bq://PROJECT_ID.DATASET_NAME.TABLE_NAME",
enable_otel_logging=True
)
REST API
Create or update a PublisherModelConfig
using setPublisherModelConfig
:
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The location of the model resource. Request-response logging is supported for all regions supported by the model.
- PUBLISHER: The publisher name. For example,
google
. - MODEL: The foundation model name. For example,
gemini-2.0-flash-001
. - SAMPLING_RATE: To reduce storage costs, you can set a number between 0 or 1 to define the fraction of requests to log. For example, a value of 1 logs all requests, and a value of 0.1 logs 10% of requests.
- BQ_URI: the BigQuery table to be used for logging. If you only specify a project name, a new dataset is created with the name
logging_ENDPOINT_DISPLAY_NAME\_ENDPOINT_ID
, whereENDPOINT_DISPLAY_NAME
follows the BigQuery naming rules. If you don't specify a table name, a new table is created with the namerequest_response_logging
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig
Request JSON body:
{ "publisherModelConfig": { "loggingConfig": { "enabled": true, "samplingRate": SAMPLING_RATE, "bigqueryDestination": { "outputUri": "BQ_URI" }, "enableOtelLogging": true } }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Get logging configuration
Get the request-response logging configuration for the foundation model by using the REST API.
REST API
Get the request-response logging configuration using
fetchPublisherModelConfig
:
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The location of the model resource.
- PUBLISHER: The publisher name. For example,
google
. - MODEL: The foundation model name. For example,
gemini-2.0-flash-001
.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:fetchPublisherModelConfig
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:fetchPublisherModelConfig"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:fetchPublisherModelConfig" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Disable logging
Disable request-response logging for the foundation model by using the REST API or Python SDK.
Python SDK
publisher_model.set_request_response_logging_config(
enabled=False,
sampling_rate=0,
bigquery_destination=''
)
REST API
Use setPublisherModelConfig
to disable logging:
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The location of the model resource.
- PUBLISHER: The publisher name. For example,
google
. - MODEL: The foundation model name. For example,
gemini-2.0-flash-001
.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig
Request JSON body:
{ "publisherModelConfig": { "loggingConfig": { "enabled": false, } }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/PUBLISHER/models/MODEL:setPublisherModelConfig" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Request-response logs for fine-tuned models
You can configure request-response logs for fine-tuned models by using the REST API or Python SDK.
Enable request-response logs
Select one the of the following tabs for instructions on enabling request-response logs for a fine-tuned model.
Python SDK
This method can be used to update the request-response logging configuration for an endpoint.
tuned_model = GenerativeModel("projects/PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")
# Set logging configuration
tuned_model.set_request_response_logging_config(
enabled=True,
sampling_rate=1.0,
bigquery_destination="bq://PROJECT_ID.DATASET_NAME.TABLE_NAME",
enable_otel_logging=True
)
REST API
You can only enable request-response logging when you create an endpoint using
projects.locations.endpoints.create
or patch an existing endpoint using
projects.locations.endpoints.patch
.
Requests and responses are logged at the endpoint level, so requests sent to any deployed models under the same endpoint are logged.
When you
create or patch an endpoint,
populate the predictRequestResponseLoggingConfig
field of the
Endpoint resource
with the following entries:
enabled
: set toTrue
to enable request-response logging.samplingRate
: To reduce storage costs, you can set a number between 0 or 1 to define the fraction of requests to log. For example, a value of 1 logs all requests, and a value of 0.1 logs 10% of requests.BigQueryDestination
: the BigQuery table to be used for logging. If you only specify a project name, a new dataset is created with the namelogging_ENDPOINT_DISPLAY_NAME_ENDPOINT_ID
, whereENDPOINT_DISPLAY_NAME
follows the BigQuery naming rules. If you don't specify a table name, a new table is created with the namerequest_response_logging
.enableOtelLogging
: set totrue
to enable OpenTelemetry (OTEL) logging in addition to the default request-response logging.
To view the BigQuery table schema, see Logging table schema.
The following is an example configuration:
{ "predictRequestResponseLoggingConfig": { "enabled": true, "samplingRate": 0.5, "bigqueryDestination": { "outputUri": "bq://PROJECT_ID.DATASET_NAME.TABLE_NAME" }, "enableOtelLogging": true } }
Get logging configuration
Get the request-response logging configuration for the fine-tuned model by using the REST API.
REST API
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- LOCATION: The location of the endpoint resource.
- MODEL: The foundation model name. For example,
gemini-2.0-flash-001
. - ENDPOINT_ID: The ID of the endpoint.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
Disable logging configuration
Disable the request-response logging configuration for the endpoint.
Python SDK
tuned_model = GenerativeModel("projects/PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")
# Set logging configuration
tuned_model.set_request_response_logging_config(
enabled=False,
sampling_rate=1.0,
bigquery_destination="bq://PROJECT_ID.DATASET_NAME.TABLE_NAME",
enable_otel_logging=False
)
REST API
{ "predictRequestResponseLoggingConfig": { "enabled": false, } }
Logging table schema
In BigQuery, the logs are recorded using the following schema:
Field name | Type | Notes |
---|---|---|
endpoint | STRING | Resource name of the endpoint to which the tuned model is deployed. |
deployed_model_id | STRING | Deployed model ID for a tuned model deployed to an endpoint. |
logging_time | TIMESTAMP | The time that logging is performed. This is roughly the time that the response is returned. |
request_id | NUMERIC | The auto-generated integer request ID based on the API request. |
request_payload | STRING | Included for partner model logging and backward compatibility with the Vertex AI endpoint request-response log. |
response_payload | STRING | Included for partner model logging and backward compatibility with the Vertex AI endpoint request-response log. |
model | STRING | Model resource name. |
model_version | STRING | The model version. This is often "default" for Gemini models. |
api_method | STRING | generateContent, streamGenerateContent, rawPredict, streamRawPredict |
full_request | JSON | The full GenerateContentRequest . |
full_response | JSON | The full GenerateContentResponse . |
metadata | JSON | Any metadata of the call; contains the request latency. |
otel_log | JSON | Logs in OpenTelemetry schema format. Only available if otel_logging is enabled in the logging configuration. |
Note that request-response pairs larger than the BigQuery write API 10MB row limit are not recorded.
What's next
- Estimate pricing for online prediction logging.
- Deploy a model using the Google Cloud console or using the Vertex AI API.
- Learn how to create a BigQuery table.