Docs

Messages API

The native Anthropic Messages API for the Claude family. Send the real Anthropic request shape — cache_control, tools, and extended thinking pass straight through, untranslated.

本文件為英文版。如果你需要繁體中文版的快速接入指引，請看：繁體中文快速開始 — 從台灣調用 Claude API →

Endpoint: POST /v1/messages. Request and response match Anthropic's Messages API exactly, so the Anthropic SDK — or any tool that speaks it — works against Kunavo by changing only the base URL and the key.

Which endpoint should I use?

Kunavo exposes the Claude family two ways. They bill identically; pick by which request shape your code already speaks.

Endpoint	Shape	Use it when
`/v1/chat/completions`	OpenAI chat	You already use the OpenAI SDK, or you want one code path across Claude, Gemini and GPT.
`/v1/messages`	Anthropic Messages	You use the Anthropic SDK / Claude Code, or you want native cache_control, tools and thinking with no translation layer.

Authentication

Point the base URL at https://api.kunavo.com — the Anthropic SDK appends /v1/messages itself. Pass your Kunavo key (sk-kn-...) as the API key; the endpoint accepts it via either the x-api-key header (Anthropic SDK default) or Authorization: Bearer.

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-kn-...",                  # your Kunavo key
    base_url="https://api.kunavo.com",    # SDK appends /v1/messages
)

msg = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a senior staff engineer.",
    messages=[
        {"role": "user", "content": "Pros and cons of event sourcing?"},
    ],
)
print(msg.content[0].text)

Prompt caching

This is the main reason to use the native endpoint. Attach a cache_control breakpoint to any system block, message content block, or tool definition. The marked prefix is written to cache once, then replayed on later calls that share it.

# Mark a large, stable prefix with cache_control. Reuse it across calls and
# that span is read back from cache at ~10% of the input token price.
msg = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LONG_STYLE_GUIDE,            # tens of thousands of tokens
            "cache_control": {"type": "ephemeral"},
        },
    ],
    messages=[{"role": "user", "content": "Review this PR against the guide."}],
)
u = msg.usage
print(u.cache_creation_input_tokens, u.cache_read_input_tokens)

Pricing for the Claude family on Kunavo:

cache_read_input_tokens — served from cache, billed at 0.10× the input rate (a 90% discount).
cache_creation_input_tokens — written to cache, billed at the plain input rate. Kunavo does not pass through Anthropic's upstream cache-write surcharge.

Kunavo keeps your cache warm automatically: requests that share a prompt prefix are routed to the same upstream account, with no configuration on your side.

Streaming

Set stream: true (or use the SDK stream() helper). Kunavo forwards Anthropic's native event stream — message_start, content_block_delta, message_delta, message_stop — through verbatim.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain CRDTs in one paragraph."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Usage object

Every response (and the streaming message_delta / message_start events) carries a native Anthropic usage object:

Field	Meaning
`input_tokens`	Fresh (uncached) input tokens.
`cache_read_input_tokens`	Tokens served from cache (billed 0.10× input).
`cache_creation_input_tokens`	Tokens written to cache this call (billed at the plain input rate).
`output_tokens`	Output tokens, including extended thinking.
`credits_consumed`	Kunavo addition. Raw cost in kie credits (1 credit = $0.005).

Supported models

Any enabled Claude model accepts /v1/messages. Send the Kunavo slug as model:

claude-fable-5, claude-sonnet-5
claude-opus-4-7, claude-opus-4-6
claude-sonnet-4-6, claude-haiku-4-5

Gemini and GPT models are not available here — call them via /v1/chat/completions.

Raw HTTP

No SDK required — the endpoint is a plain JSON POST:

curl https://api.kunavo.com/v1/messages \
  -H "x-api-key: sk-kn-..." \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude"}]
  }'