Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes.
Authentication
AuthorizationBearer
API key as bearer token in Authorization header
Request
This endpoint expects an object.
messageslist of objectsRequired
List of messages for the conversation
providerobject or nullOptional
When multiple model providers are available, optionally indicate your routing preference.
pluginslist of objectsOptional
Plugins you want to enable for this request, including their settings.
userstringOptional
Unique user identifier
session_idstringOptional<=256 characters
A unique identifier for grouping related requests (e.g., a conversation or agent workflow) for observability. If provided in both the request body and the x-session-id header, the body value takes precedence. Maximum of 256 characters.
traceobjectOptional
Metadata for observability and tracing. Known keys (trace_id, trace_name, span_name, generation_name, parent_span_id) have special handling. Additional keys are passed through as custom metadata to configured broadcast destinations.
modelstringOptional
Model to use for completion
modelslist of objectsOptional
Models to use for completion
frequency_penaltydoubleOptional
Frequency penalty (-2.0 to 2.0)
logit_biasmap from strings to doubles or nullOptional
Token logit bias adjustments
logprobsboolean or nullOptional
Return log probabilities
top_logprobsintegerOptional
Number of top log probabilities to return (0-20)
max_completion_tokensintegerOptional
Maximum tokens in completion
max_tokensintegerOptional
Maximum tokens (deprecated, use max_completion_tokens). Note: some providers enforce a minimum of 16.
metadatamap from strings to stringsOptional
Key-value pairs for additional object information (max 16 pairs, 64 char keys, 512 char values)
presence_penaltydoubleOptional
Presence penalty (-2.0 to 2.0)
reasoningobjectOptional
Configuration options for reasoning models
response_formatobjectOptional
Response format configuration
seedintegerOptional
Random seed for deterministic outputs
stopstring or list of strings or anyOptional
Stop sequences (up to 4)
streambooleanOptionalDefaults to false
Enable streaming response
stream_optionsobjectOptional
Streaming configuration options
temperaturedoubleOptional
Sampling temperature (0-2)
parallel_tool_callsboolean or nullOptional
tool_choiceenum or objectOptional
Tool choice configuration
toolslist of objectsOptional
Available tools for function calling
top_pdoubleOptional
Nucleus sampling parameter (0-1)
debugobjectOptional
Debug options for inspecting request transformations (streaming only)
image_configmap from strings to strings or doubles or lists of anyOptional
Output modalities for the response. Supported values are "text", "image", and "audio".
Allowed values:
cache_controlobjectOptional
Enable automatic prompt caching. When set, the system automatically applies cache breakpoints to the last cacheable block in the request. Currently supported for Anthropic Claude models.
service_tierenum or nullOptional
The service tier to use for processing this request.
Allowed values:
Response
Successful chat completion response
idstring
Unique completion identifier
choiceslist of objects
List of completion choices
createdinteger
Unix timestamp of creation
modelstring
Model used for completion
objectenum
Allowed values:
system_fingerprintstring or null
System fingerprint
service_tierstring or null
The service tier used by the upstream provider for this request