Create a chat completion

POST

https://openrouter.ai/api/v1/chat/completions

POST

/api/v1/chat/completions

$ curl -X POST https://openrouter.ai/api/v1/chat/completions \
>      -H "X-OpenRouter-Experimental-Metadata: enabled" \
>      -H "Authorization: Bearer <token>" \
>      -H "Content-Type: application/json" \
>      -d '{
>   "messages": [
>     {
>       "role": "system",
>       "content": "You are a helpful assistant."
>     },
>     {
>       "role": "user",
>       "content": "What is the capital of France?"
>     }
>   ],
>   "max_tokens": 150,
>   "model": "openai/gpt-4",
>   "temperature": 0.7
> }'

1 {
2   "choices": [
3     {
4       "finish_reason": "stop",
5       "index": 0,
6       "message": {
7         "content": "The capital of France is Paris.",
8         "role": "assistant"
9       }
10     }
11   ],
12   "created": 1677652288,
13   "id": "chatcmpl-123",
14   "model": "openai/gpt-4",
15   "object": "chat.completion",
16   "system_fingerprint": "fp_44709d6fcb",
17   "usage": {
18     "completion_tokens": 10,
19     "prompt_tokens": 25,
20     "total_tokens": 35
21   }
22 }

Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes.

Create a chat completion

POST

https://openrouter.ai/api/v1/chat/completions

POST

/api/v1/chat/completions

$ curl -X POST https://openrouter.ai/api/v1/chat/completions \
>      -H "X-OpenRouter-Experimental-Metadata: enabled" \
>      -H "Authorization: Bearer <token>" \
>      -H "Content-Type: application/json" \
>      -d '{
>   "messages": [
>     {
>       "role": "system",
>       "content": "You are a helpful assistant."
>     },
>     {
>       "role": "user",
>       "content": "What is the capital of France?"
>     }
>   ],
>   "max_tokens": 150,
>   "model": "openai/gpt-4",
>   "temperature": 0.7
> }'

1 {
2   "choices": [
3     {
4       "finish_reason": "stop",
5       "index": 0,
6       "message": {
7         "content": "The capital of France is Paris.",
8         "role": "assistant"
9       }
10     }
11   ],
12   "created": 1677652288,
13   "id": "chatcmpl-123",
14   "model": "openai/gpt-4",
15   "object": "chat.completion",
16   "system_fingerprint": "fp_44709d6fcb",
17   "usage": {
18     "completion_tokens": 10,
19     "prompt_tokens": 25,
20     "total_tokens": 35
21   }
22 }

Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes.

Authentication

AuthorizationBearer

API key as bearer token in Authorization header

Request

This endpoint expects an object.

messageslist of objectsRequired

List of messages for the conversation

cache_controlobjectOptional

Enable automatic prompt caching. When set at the top level, the system automatically applies cache breakpoints to the last cacheable block in the request. Currently supported for Anthropic Claude models.

debugobjectOptional

Debug options for inspecting request transformations (streaming only)

frequency_penaltydouble or nullOptional

Frequency penalty (-2.0 to 2.0)

image_configstring or double or list of anyOptional

logit_biasmap from strings to doubles or nullOptional

Token logit bias adjustments

logprobsboolean or nullOptional

Return log probabilities

max_completion_tokensinteger or nullOptional

Maximum tokens in completion

max_tokensinteger or nullOptional

Maximum tokens (deprecated, use max_completion_tokens). Note: some providers enforce a minimum of 16.

metadatamap from strings to stringsOptional

Key-value pairs for additional object information (max 16 pairs, 64 char keys, 512 char values)

modalitieslist of enumsOptional

Output modalities for the response. Supported values are "text", "image", and "audio".

Allowed values:

modelstringOptional

Model to use for completion

modelslist of stringsOptional

Models to use for completion

parallel_tool_callsboolean or nullOptional

Whether to enable parallel function calling during tool use. When true, the model may generate multiple tool calls in a single response.

pluginslist of objectsOptional

Plugins you want to enable for this request, including their settings.

presence_penaltydouble or nullOptional

Presence penalty (-2.0 to 2.0)

providerobjectOptional

When multiple model providers are available, optionally indicate your routing preference.

reasoningobjectOptional

Configuration options for reasoning models

response_formatobjectOptional

Response format configuration

routeanyOptional

seedinteger or nullOptional

Random seed for deterministic outputs

service_tierenum or nullOptional

The service tier to use for processing this request.

Allowed values:

session_idstringOptional<=256 characters

A unique identifier for grouping related requests (e.g., a conversation or agent workflow). When provided, OpenRouter uses it as the sticky routing key, routing all requests in the session to the same provider to maximize prompt cache hits. Also used for observability grouping. If provided in both the request body and the x-session-id header, the body value takes precedence. Maximum of 256 characters.

stopstring or list of strings or anyOptional

Stop sequences (up to 4)

stop_server_tools_whenlist of objectsOptional

Stop conditions for the server-tool agent loop. Any condition firing halts the loop (OR logic). When set, this overrides max_tool_calls.

streambooleanOptionalDefaults to false

Enable streaming response

stream_optionsobjectOptional

Streaming configuration options

temperaturedouble or nullOptional

Sampling temperature (0-2)

tool_choiceenum or objectOptional

Tool choice configuration

toolslist of objectsOptional

Available tools for function calling

top_logprobsinteger or nullOptional

Number of top log probabilities to return (0-20)

top_pdouble or nullOptional

Nucleus sampling parameter (0-1)

traceobjectOptional

Metadata for observability and tracing. Known keys (trace_id, trace_name, span_name, generation_name, parent_span_id) have special handling. Additional keys are passed through as custom metadata to configured broadcast destinations.

userstringOptional

Unique user identifier

Response

Successful chat completion response

choiceslist of objects

List of completion choices

createdinteger

Unix timestamp of creation

idstring

Unique completion identifier

modelstring

Model used for completion

objectenum

Allowed values:

system_fingerprintstring or null

System fingerprint

openrouter_metadataobject

service_tierstring or null

The service tier used by the upstream provider for this request

usageobject

Token usage statistics

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

404

Not Found Error

408

Request Timeout Error

413

Content Too Large Error

422

Unprocessable Entity Error

429

Too Many Requests Error

500

Internal Server Error

502

Bad Gateway Error

503

Service Unavailable Error

Authentication

AuthorizationBearer

API key as bearer token in Authorization header

Request

This endpoint expects an object.

messageslist of objectsRequired

List of messages for the conversation

cache_controlobjectOptional

debugobjectOptional

Debug options for inspecting request transformations (streaming only)

frequency_penaltydouble or nullOptional

Frequency penalty (-2.0 to 2.0)

image_configstring or double or list of anyOptional

logit_biasmap from strings to doubles or nullOptional

Token logit bias adjustments

logprobsboolean or nullOptional

Return log probabilities

max_completion_tokensinteger or nullOptional

Maximum tokens in completion

max_tokensinteger or nullOptional

Maximum tokens (deprecated, use max_completion_tokens). Note: some providers enforce a minimum of 16.

metadatamap from strings to stringsOptional

Key-value pairs for additional object information (max 16 pairs, 64 char keys, 512 char values)

modalitieslist of enumsOptional

Output modalities for the response. Supported values are "text", "image", and "audio".

Allowed values:

modelstringOptional

Model to use for completion

modelslist of stringsOptional

Models to use for completion

parallel_tool_callsboolean or nullOptional

Whether to enable parallel function calling during tool use. When true, the model may generate multiple tool calls in a single response.

pluginslist of objectsOptional

Plugins you want to enable for this request, including their settings.

presence_penaltydouble or nullOptional

Presence penalty (-2.0 to 2.0)

providerobjectOptional

When multiple model providers are available, optionally indicate your routing preference.

reasoningobjectOptional

Configuration options for reasoning models

response_formatobjectOptional

Response format configuration

routeanyOptional

seedinteger or nullOptional

Random seed for deterministic outputs

service_tierenum or nullOptional

The service tier to use for processing this request.

Allowed values:

session_idstringOptional<=256 characters

stopstring or list of strings or anyOptional

Stop sequences (up to 4)

stop_server_tools_whenlist of objectsOptional

Stop conditions for the server-tool agent loop. Any condition firing halts the loop (OR logic). When set, this overrides max_tool_calls.

streambooleanOptionalDefaults to false

Enable streaming response

stream_optionsobjectOptional

Streaming configuration options

temperaturedouble or nullOptional

Sampling temperature (0-2)

tool_choiceenum or objectOptional

Tool choice configuration

toolslist of objectsOptional

Available tools for function calling

top_logprobsinteger or nullOptional

Number of top log probabilities to return (0-20)

top_pdouble or nullOptional

Nucleus sampling parameter (0-1)

traceobjectOptional

userstringOptional

Unique user identifier

Response

Successful chat completion response

choiceslist of objects

List of completion choices

createdinteger

Unix timestamp of creation

idstring

Unique completion identifier

modelstring

Model used for completion

objectenum

Allowed values:

system_fingerprintstring or null

System fingerprint

openrouter_metadataobject

service_tierstring or null

The service tier used by the upstream provider for this request

usageobject

Token usage statistics

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

404

Not Found Error

408

Request Timeout Error

413

Content Too Large Error

422

Unprocessable Entity Error

429

Too Many Requests Error

500

Internal Server Error

502

Bad Gateway Error

503

Service Unavailable Error

Create a chat completion

Create a chat completion

Authentication

Headers

Request

Response

Errors

Authentication

Headers

Request

Response

Errors

$	curl -X POST https://openrouter.ai/api/v1/chat/completions \
>	-H "X-OpenRouter-Experimental-Metadata: enabled" \
>	-H "Authorization: Bearer <token>" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"messages": [
>	{
>	"role": "system",
>	"content": "You are a helpful assistant."
>	},
>	{
>	"role": "user",
>	"content": "What is the capital of France?"
>	}
>	],
>	"max_tokens": 150,
>	"model": "openai/gpt-4",
>	"temperature": 0.7
>	}'

1	{
2	"choices": [
3	{
4	"finish_reason": "stop",
5	"index": 0,
6	"message": {
7	"content": "The capital of France is Paris.",
8	"role": "assistant"
9	}
10	}
11	],
12	"created": 1677652288,
13	"id": "chatcmpl-123",
14	"model": "openai/gpt-4",
15	"object": "chat.completion",
16	"system_fingerprint": "fp_44709d6fcb",
17	"usage": {
18	"completion_tokens": 10,
19	"prompt_tokens": 25,
20	"total_tokens": 35
21	}
22	}