WiseRouter provides two primary endpoints for generation: completions and chat completions.
Both are OpenAI-compatible and powered by vLLM.
/completions
Best for simple generation tasks. Takes a single text prompt and continues it.
/chat/completions
Better for conversational interactions. Supports multiple messages with roles (user/assistant) and maintains conversation context.
curl -X POST "https://api.wiserouter.ai/v1/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $YOUR_API_KEY" \
-d '{
"prompt": "Write a poem about coding in Python:",
"max_tokens": 400,
"temperature": 0.7,
"repetition_penalty": 1.1
}'
Response:
{
"id": "cmpl-2h38kn309nvb1",
"object": "text_completion",
"created": 1703262150,
"model": "redacted",
"choices": [{
"text": "In lines of indented grace,\nPython slithers into place.\nWith modules, functions, and loops galore,\nBuilding programs we can't ignore.\n\nSimple syntax, readable and clear,\nMaking complex tasks appear near...",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 400,
"total_tokens": 408
}
}
curl -X POST "https://api.wiserouter.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $YOUR_API_KEY" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful coding assistant."
},
{
"role": "user",
"content": "Write a Python function to calculate Fibonacci numbers."
}
],
"temperature": 0.7,
"max_tokens": 400,
"repetition_penalty": 1.1
}'
Response:
{
"id": "chatcmpl-20osd348shx1",
"object": "chat.completion",
"created": 1703262150,
"model": "redacted",
"choices": [{
"message": {
"role": "assistant",
"content": "Here's a Python function to calculate Fibonacci numbers recursively:\n\ndef fibonacci(n):\nif n <= 1:\nreturn n\nreturn fibonacci(n-1) + fibonacci(n-2)\n\n# Example usage:\nfor i in range(10):\nprint(f'fibonacci({i}) = {fibonacci(i)}')"
},
"logprobs": null,
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 33,
"completion_tokens": 89,
"total_tokens": 122
}
}
Note: to prevent reverse engineering, the model used is redacted by default.
prompt
The prompt to generate completions for.
messages
Array of message objects containing:
role
: "system", "user", or "assistant"content
: message textrouter_preference
Set a routing preference for model selection:
balanced
(default) - Optimal balance between powerful and efficientpowerful
- Favors more powerful models for best results but uses more creditsefficient
- Prioritizes lighter models when possible to minimize credit usagemax_tokens
Maximum number of completion tokens to generate per output sequence. Defaults to 16.
temperature
Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling. Defaults to 1.0.
repetition_penalty
Float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values greater than 1 encourage the model to use new tokens, while values less than 1 encourage the model to repeat tokens. Defaults to 1.0.
top_p
Float that controls the cumulative probability of the top tokens to consider. Must be in [0, 1]. Set to 1 to consider all tokens. Defaults to 1.0.
stream
Whether to stream back partial progress. Defaults to false.
Check out the vLLM documentation for a full list of sampling parameters.