Gemini is among the best value in frontier AI, and Kunavo prices it about 70% below Google's list rate behind one OpenAI-compatible API. This guide gives the current per-model rates, worked cost examples you can sanity-check, and the cheapest way to call Gemini in production.
Gemini pricing at a glance
Rates are per 1M tokens, in USD, as billed on Kunavo. The “Google list” column is Google's published rate for the same model, shown so you can see the delta.
| Model | Input / 1M | Output / 1M | Google list (in / out) | You save |
|---|---|---|---|---|
gemini-2-5-flash | $0.09 | $0.75 | $0.30 / $2.50 | ~70% |
gemini-2-5-pro | $0.375 | $3.00 | $1.25 / $10.00 | ~70% |
Flash is the high-volume workhorse; Pro is for harder reasoning, vision and long-context jobs. Live rates always show on the pricing page and each model page (gemini-2-5-flash, gemini-2-5-pro).
How Gemini token pricing works
You pay for input tokens (everything you send — system prompt, retrieved context, the user message) and output tokens (what the model generates). Output is the more expensive side, so the single biggest lever on a Gemini bill is how much text you let the model write. Images and audio are converted to token-equivalents and billed on the same meter.
Worked cost examples
Real numbers at Kunavo's Gemini 2.5 Flash rate, except the last row which uses Gemini 2.5 Pro:
| Workload | Tokens (in / out) | Model | Cost |
|---|---|---|---|
| Chatbot turn | 1,000 / 300 | Flash | $0.0003 |
| RAG answer | 8,000 / 500 | Flash | $0.0011 |
| Batch classify (per doc) | 500 / 20 | Flash | $0.00006 |
| Long-context analysis | 20,000 / 2,000 | Pro | $0.0135 |
At those rates a 100,000-document classification batch on Flash runs about $6, and a million chatbot turns about $315. The math, runnable:
# Kunavo Gemini 2.5 Flash rates (USD per 1M tokens)
IN_RATE, OUT_RATE = 0.09, 0.75
def cost(in_tokens: int, out_tokens: int) -> float:
return in_tokens / 1_000_000 * IN_RATE + out_tokens / 1_000_000 * OUT_RATE
print(cost(1_000, 300)) # one chatbot turn -> $0.000315
print(cost(8_000, 500)) # one RAG answer -> $0.001095
print(cost(500, 20) * 100_000) # 100k-doc batch -> ~$6.00Kunavo pricing and Stripe billing
There is no subscription and no Google Cloud project. You top up a balance (Stripe or local payment methods), and calls draw down from it at the per-token rates above. New accounts start with $2 of free credit, and larger top-ups carry bonus credit. One balance covers Gemini and every other model — Claude, GPT, image, video and audio — so you are not reconciling a separate invoice per provider.
Which Gemini model should I choose?
- gemini-2-5-flash — default for chat, extraction, classification, summarization and most RAG. Fast and the cheapest capable option.
- gemini-2-5-pro — reach for it when Flash is not accurate enough: multi-step reasoning, code, vision and very long context.
A good pattern is to route by difficulty: Flash for the common case, escalate to Pro only when a check fails. See the AI cost optimization guide for the routing pattern in code.
Cutting your Gemini bill
- Tier down. Send the easy 80% to Flash; reserve Pro for the hard 20%.
- Cap output. Set
max_tokensand stop sequences — output is the pricey side of the meter. - Trim input. Retrieve fewer, better RAG chunks instead of stuffing the whole knowledge base into context.
- Batch. Group independent calls to keep latency down and avoid retry storms.
FAQ
Is the Gemini API free?
Google AI Studio has a rate-limited free tier for prototyping; production is pay-per-token. On Kunavo you get $2 of free credit at sign-up, then pay the per-token rates above — no Google Cloud billing account required.
How much does Gemini 2.5 Flash cost?
$0.09 per 1M input tokens and $0.75 per 1M output tokens on Kunavo — about 70% under Google's $0.30 / $2.50 list price. A typical chatbot turn costs roughly $0.0003.
Is Gemini cheaper than Claude or GPT?
Gemini 2.5 Flash is one of the cheapest capable models anywhere — under Claude Haiku and most GPT tiers for high-volume work. Compare the full table on the pricing page.
How do I reduce Gemini API cost?
Tier to Flash, cap output, trim retrieved context, and batch. Details in the cost optimization guide. To start calling Gemini, see how to get a Gemini API key.