Docs
Messages API
The native Anthropic Messages API for the Claude family. Send the real Anthropic request shape — cache_control, tools, and extended thinking pass straight through, untranslated.
Endpoint: POST /v1/messages. Request and response match Anthropic's Messages API exactly, so the Anthropic SDK — or any tool that speaks it — works against Kunavo by changing only the base URL and the key.
Which endpoint should I use?
Kunavo exposes the Claude family two ways. They bill identically; pick by which request shape your code already speaks.
| Endpoint | Shape | Use it when |
|---|---|---|
/v1/chat/completions | OpenAI chat | You already use the OpenAI SDK, or you want one code path across Claude, Gemini and GPT. |
/v1/messages | Anthropic Messages | You use the Anthropic SDK / Claude Code, or you want native cache_control, tools and thinking with no translation layer. |
Authentication
Point the base URL at https://api.kunavo.com — the Anthropic SDK appends /v1/messages itself. Pass your Kunavo key (sk-kn-...) as the API key; the endpoint accepts it via either the x-api-key header (Anthropic SDK default) or Authorization: Bearer.
from anthropic import Anthropic
client = Anthropic(
api_key="sk-kn-...", # your Kunavo key
base_url="https://api.kunavo.com", # SDK appends /v1/messages
)
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a senior staff engineer.",
messages=[
{"role": "user", "content": "Pros and cons of event sourcing?"},
],
)
print(msg.content[0].text)Prompt caching
This is the main reason to use the native endpoint. Attach a cache_control breakpoint to any system block, message content block, or tool definition. The marked prefix is written to cache once, then replayed on later calls that share it.
# Mark a large, stable prefix with cache_control. Reuse it across calls and
# that span is read back from cache at ~10% of the input token price.
msg = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=[
{
"type": "text",
"text": LONG_STYLE_GUIDE, # tens of thousands of tokens
"cache_control": {"type": "ephemeral"},
},
],
messages=[{"role": "user", "content": "Review this PR against the guide."}],
)
u = msg.usage
print(u.cache_creation_input_tokens, u.cache_read_input_tokens)Pricing for the Claude family on Kunavo:
cache_read_input_tokens— served from cache, billed at 0.10× the input rate (a 90% discount).cache_creation_input_tokens— written to cache, billed at the plain input rate. Kunavo does not pass through Anthropic's upstream cache-write surcharge.
Streaming
Set stream: true (or use the SDK stream() helper). Kunavo forwards Anthropic's native event stream — message_start, content_block_delta, message_delta, message_stop — through verbatim.
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain CRDTs in one paragraph."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Usage object
Every response (and the streaming message_delta / message_start events) carries a native Anthropic usage object:
| Field | Meaning |
|---|---|
input_tokens | Fresh (uncached) input tokens. |
cache_read_input_tokens | Tokens served from cache (billed 0.10× input). |
cache_creation_input_tokens | Tokens written to cache this call (billed at the plain input rate). |
output_tokens | Output tokens, including extended thinking. |
credits_consumed | Kunavo addition. Raw cost in kie credits (1 credit = $0.005). |
Supported models
Any enabled Claude model accepts /v1/messages. Send the Kunavo slug as model:
claude-opus-4-7,claude-opus-4-6claude-sonnet-4-6,claude-sonnet-4-5claude-haiku-4-5
Gemini and GPT models are not available here — call them via /v1/chat/completions.
Raw HTTP
No SDK required — the endpoint is a plain JSON POST:
curl https://api.kunavo.com/v1/messages \
-H "x-api-key: sk-kn-..." \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello, Claude"}]
}'