Request Headers
The authorization token (required).
Request Body
A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and audio.
Items
Properties
Variants
Items
Properties
Properties
Variants
Items
Properties
Properties
Variants
Items
Properties
Properties
Properties
Variants
Properties
Properties
Variants
Properties
Properties
Properties
Variants
Items
Properties
Properties
Items
Properties
Properties
Properties
Variants
Items
Properties
The Query Model to use for the query completion.
Variants
The base62 22-character unique identifier for the Query Model.
The provided Query Model object.
Properties
The LLMs which make up the Model.
Items
An LLM which is part of the Model.
Properties
Model ID used to generate the response.
The mode of the model, which determines whether it generates a response or selects from the generated options.
Variants
The model generates a response.
The model selects a Generate ID. The model will output reasoning, even if the LLM is not a reasoning model. Best for non-reasoning models.
The model selects a Generate ID.
The model selects one or more Generate IDs as a probability distribution. The model will output reasoning, even if the LLM is not a reasoning model. Best for non-reasoning models.
The model selects one or more Generate IDs as a probability distribution.
If the mode is one of the select logprobs modes, this controls how many of the top options are returned with their probabilities.
This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
This setting aims to control the presence of tokens in the output. It tries to encourage the model to use tokens that are less present in the input, proportional to their presence in the input. Token presence scales with the number of occurrences. Negative values will encourage more diverse token usage.
Constrains effort on reasoning for some reasoning models.
Variants
Stop generation immediately if the model encounters any token specified in the stop array.
Variants
Items
This setting influences the variety in the model’s responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.
This setting limits the model’s choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model’s responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.
This sets the upper limit for the number of tokens the model can generate in response. It won’t produce more than this limit. The maximum value is the context length minus the prompt length.
Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.
OpenRouter provider preferences.
Properties
List of provider slugs to try in order.
Items
Whether to allow backup providers when the primary is unavailable.
Only use providers that support all parameters in your request.
Control whether to use providers that may store data.
Variants
List of provider slugs to allow for this request.
Items
List of provider slugs to skip for this request.
Items
List of quantization levels to filter by.
Items
Sort providers by price or throughput.
OpenRouter reasoning configuration.
Properties
An upper bound for the number of tokens that can be generated for reasoning.
Constrains effort on reasoning for some reasoning models.
Variants
Whether reasoning is enabled for this request.
Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token’s probability.
Consider only the top tokens with “sufficiently high” probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.
This limits the model’s choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.
Controls the verbosity and length of the model response. Lower values produce more concise responses, while higher values produce more detailed and comprehensive responses.
Variants
Fallback models. Will be tried in order if the first one fails.
Items
The weight of the model, which determines its influence on the Confidence Score. Must match the weight strategy of the parent Model.
Variants
A static weight value.
Properties
The static weight value.
A dynamic weight value based on training table data.
Properties
The base weight value, uninfluenced by training table data.
The minimum weight value. A model that never matches the correct answer will have this weight.
The maximum weight value. A model that always matches the correct answer will have this weight.
The weight strategy for the Model, which determines how the Confidence Score is calculated.
Variants
Each LLM has a fixed weight.
Properties
Each LLM has a dynamic weight based on training table data.
Properties
The embedding model used to compute prompt embeddings for a training table vector search.
The number of most similar training table entries to consider when computing the dynamic weight.
Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
How many query completion choices to generate for each LLM in the Model. For example, if the model contains 4 LLMs, setting this to 2 will generate 8 choices.
The predicted output from the model.
An object specifying the format that the model must output.
Variants
Responses will have no specific format.
Properties
Responses will be JSON Objects.
Properties
Responses will adhere to the provided JSON Schema. May include custom "_confidence" or "_preserveOrder" fields to control Confidence ID computation.
"_confidence_" may be applied to any typed field in the schema to indicate whether the field should be included when computing Confidence ID (true by default).
"_preserveOrder" may be applied to array fields to indicate whether the order of items in the array should be preserved when computing Confidence ID (false by default).
If the Query Model contains only "select" LLMs, this field is required, and must contain only "object", "boolean", or "string" with an "enum" fields.
Properties
Properties
Whether to strictly enforce the schema. If true, the model will only output properties defined in the schema. If false, the model may output additional properties.
The JSON Schema object defining the expected structure.
If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.
Specifies the processing type used for serving the request.
Variants
If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
Options for streaming response.
Properties
If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, as well as the cost, if requested.
A list of tools the model may call.
Items
Properties
Properties
The JSON Schema object defining the expected structure.
Whether to strictly enforce the schema. If true, the model will only output properties defined in the schema. If false, the model may output additional properties.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
OpenRouter accounting configuration.
Properties
Whether to include Cost in the response usage.
If specified, an embedding of each choice outputted by a "generate" LLM will include an embedding vector of the text content.
If true, Response Format must be "json_schema". The Schema must contain only object, string enum, or boolean properties.
A choice will be generated for each possible JSON output that can be constructed from the provided schema. The "model" field of these choices will be the "name" of the "json_schema". "select" LLMs will be able to vote on these choices.
If the Query Model contains only "select" LLMs, this field must be set to true.
Response Body (Unary)
A unique identifier for the chat completion.
An array of choices returned by the Query Model.
Items
The message generated by the model for this choice.
Properties
The content of the message generated by the model.
The refusal information if the model refused to generate a message.
The role of the message, which is always assistant for model-generated messages.
The annotations added by the model in this message.
Items
Properties
Properties
The end index of the citation in the message content.
The start index of the citation in the message content.
The title of the cited webpage.
The URL of the cited webpage.
The audio generated by the model in this message.
Properties
The tool calls made by the model in this delta.
Items
The tool call ID.
Properties
The name of the function being called.
The arguments passed to the function.
The reasoning text generated by the model in this message.
The images generated by the model in this message.
Items
Properties
Properties
The reason why the model finished generating the response.
Variants
The model finished generating because it reached a natural stopping point.
The model finished generating because it reached the maximum token limit.
The model finished generating because it made one or more tool calls.
The model finished generating because it triggered a content filter.
The model finished generating because an error occurred.
The index of the choice in the list of choices.
The log probabilities of the tokens in the delta.
Properties
An array of log probabilities for each token in the content.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
An array of log probabilities for each token in the refusal.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
A hash of the text content of the choice. Only present for "generate" LLMs.
For "generate" LLMs, a hash of the text content of the choice. When the content is JSON, and Response Format was provided, the content may be modified prior to computing the hash. "_confidence": false (defaults to true) properties will be omitted. Omitting allows choices which only differ on unimportant properties to be treated as the same. "_preserveOrder": false (defaults to false) object property keys or array property items will be sorted. Sorting allows choices with the same content but in different orders to be treated as the same.
For "select" LLMs, the Confidence ID is the Confidence ID of the Generate ID that was selected. If using select logprobs, and the LLM selected multiple Generate IDs, then Confidence ID will be a probability distribution.
Variants
A single Confidence ID.
A map of Confidence IDs to their probabilities. All probabilities will sum to 1.
The weight of the LLM that produced this choice. For "static" weight, will always be a fixed value. For "training_table" weight, will depend on the training table data.
The confidence of the choice. Each choice with the same Confidence ID will have the same Confidence. Computed by dividing the total weight of LLMs that produced a choice with the same Confidence ID by the total weight of all Confidence IDs.
If the "embeddings" field was specified in the request, an embedding vector of the text content of the choice, or an error if the embedding failed.
Variants
The embedding response.
Properties
An array of embedding objectst.
Items
An embedding vector.
Properties
The embedding vector as an array of floats.
Items
A float in the embedding vector.
The name of the model used to generate the embeddings.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
An error occurred while generating the embedding.
Properties
The HTTP status code for the error.
A JSON message describing the error. Typically, either a string or an object.
If an error occurred while generating this choice, the error object.
Properties
The HTTP status code for the error.
A JSON message describing the error. Typically, either a string or an object.
The base62 22-character unique identifier for the LLM that produced this choice. If this choice was produced by the Response Format (with "select_deterministic": true), it will contain the name of the "json_schema"
The index of the LLM in the Query Model that produced this choice. May be missing if this choice was produced by the Response Format.
Details about the chat completion which produced this choice.
Properties
A unique identifier for the chat completion.
The Unix timestamp (in seconds) when the chat completion was created.
The model used for the chat completion.
The service tier used for the chat completion.
Variants
A fingerprint representing the system configuration used for the chat completion.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
The upstream (or upstream upstream) LLM provider used for the chat completion.
The Unix timestamp (in seconds) when the chat completion was created.
The model which generated the completion. Will be prefixed by "objectiveai/".
The service tier used for the chat completion.
Variants
A fingerprint representing the system configuration used for the chat completion.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
Training table data associated with the completion, if applicable.
Properties
The hash of the response format used to generate the completion. Each response format has separate training table data.
The embeddings response computed from the request messages.
Properties
An array of embedding objectst.
Items
An embedding vector.
Properties
The embedding vector as an array of floats.
Items
A float in the embedding vector.
The name of the model used to generate the embeddings.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
Response Body (Streaming)
A unique identifier for the chat completion.
An array of choices returned by the Query Model.
Items
An object containing the incremental updates to the chat message.
Properties
The content of the message delta.
The refusal reason if the model refused to generate a response.
The role of the message delta.
The tool calls made by the model in this delta.
Items
The index of the tool call in the message.
The tool call ID.
Properties
The name of the function being called.
The arguments passed to the function.
The reasoning text generated by the model in this delta.
The images generated by the model in this delta.
Items
Properties
Properties
The reason why the model finished generating the response.
Variants
The model finished generating because it reached a natural stopping point.
The model finished generating because it reached the maximum token limit.
The model finished generating because it made one or more tool calls.
The model finished generating because it triggered a content filter.
The model finished generating because an error occurred.
The index of the choice in the list of choices.
The log probabilities of the tokens in the delta.
Properties
An array of log probabilities for each token in the content.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
An array of log probabilities for each token in the refusal.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
Items
Properties
The token text.
The byte representation of the token.
Items
A byte in the token's byte representation.
The log probability of the token.
A hash of the text content of the choice. Only present for "generate" LLMs.
For "generate" LLMs, a hash of the text content of the choice. When the content is JSON, and Response Format was provided, the content may be modified prior to computing the hash. "_confidence": false (defaults to true) properties will be omitted. Omitting allows choices which only differ on unimportant properties to be treated as the same. "_preserveOrder": false (defaults to false) object property keys or array property items will be sorted. Sorting allows choices with the same content but in different orders to be treated as the same.
For "select" LLMs, the Confidence ID is the Confidence ID of the Generate ID that was selected. If using select logprobs, and the LLM selected multiple Generate IDs, then Confidence ID will be a probability distribution.
Variants
A single Confidence ID.
A map of Confidence IDs to their probabilities. All probabilities will sum to 1.
The weight of the LLM that produced this choice. For "static" weight, will always be a fixed value. For "training_table" weight, will depend on the training table data.
The confidence of the choice. Each choice with the same Confidence ID will have the same Confidence. Computed by dividing the total weight of LLMs that produced a choice with the same Confidence ID by the total weight of all Confidence IDs.
If the "embeddings" field was specified in the request, an embedding vector of the text content of the choice, or an error if the embedding failed.
Variants
The embedding response.
Properties
An array of embedding objectst.
Items
An embedding vector.
Properties
The embedding vector as an array of floats.
Items
A float in the embedding vector.
The name of the model used to generate the embeddings.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
An error occurred while generating the embedding.
Properties
The HTTP status code for the error.
A JSON message describing the error. Typically, either a string or an object.
If an error occurred while generating this choice, the error object.
Properties
The HTTP status code for the error.
A JSON message describing the error. Typically, either a string or an object.
The base62 22-character unique identifier for the LLM that produced this choice. If this choice was produced by the Response Format (with "select_deterministic": true), it will contain the name of the "json_schema"
The index of the LLM in the Query Model that produced this choice. May be missing if this choice was produced by the Response Format.
Details about the chat completion which produced this choice.
Properties
A unique identifier for the chat completion.
The Unix timestamp (in seconds) when the first chat completion chunk was created.
The model used for the chat completion.
The service tier used for the chat completion chunk.
Variants
A fingerprint representing the system configuration used for the chat completion chunk.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
The upstream (or upstream upstream) LLM provider used for the chat completion chunk.
The Unix timestamp (in seconds) when the first chat completion chunk was created.
The model which generated the completion. Will be prefixed by "objectiveai/".
The service tier used for the chat completion chunk.
Variants
A fingerprint representing the system configuration used for the chat completion chunk.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.
Training table data associated with the completion, if applicable.
Properties
The hash of the response format used to generate the completion. Each response format has separate training table data.
The embeddings response computed from the request messages.
Properties
An array of embedding objectst.
Items
An embedding vector.
Properties
The embedding vector as an array of floats.
Items
A float in the embedding vector.
The name of the model used to generate the embeddings.
An object containing token usage statistics for the chat completion.
Properties
The number of tokens generated in the completion.
The number of tokens in the input prompt.
The total number of tokens used (prompt + completion).
Properties
The number of audio tokens generated.
The number of reasoning tokens generated.
Properties
The number of audio tokens in the input prompt.
The number of cached tokens in the input prompt.
The cost incurred for this chat completion, in Credits.
Properties
The cost charged by the upstream LLM provider, in Credits.
The cost charged by the upstream LLM provider's own upstream LLM provider, in Credits.