返回博客
Guide·2026年5月23日·7 min read

Veo 3 and Sora API quickstart — text-to-video and image-to-video in five minutes

First Veo 3 and Sora video generation calls through an OpenAI-style API: text-to-video, image-to-video with first/last frame control, file uploads, and a production-ready Python and Node example. No waitlist, no per-provider billing.

Veo 3 and Sora put text-to-video at the same quality bar where text-to-image was eighteen months ago. The catch: both providers gate access behind waitlists, regional restrictions and bespoke billing flows that don't play with the rest of your AI stack.

This guide walks through making your first Veo 3 and Sora API call through Kunavo — no waitlist, OpenAI- compatible auth, billed per second of generated video, results served from a permanent URL. Total time: about five minutes.

Setup

  1. Sign up at kunavo.com/app/signup. You get $2 in credit on sign-up — enough for a couple of 5-second test clips.
  2. Create a key in /app/keys. It starts with sk-kunavo-.
  3. Export it: export KUNAVO_API_KEY=sk-kunavo-....

Text-to-video with Veo 3

Veo 3 is currently the best text-to-video model on the market for cinematic shots — it understands camera language (dolly, push-in, rack focus), produces stable lighting across cuts, and handles 24fps motion correctly. Generations take 30 seconds to a few minutes; the HTTP response is synchronous — set a long client timeout.

text_to_video.py
import requests, os, time

KEY = os.environ["KUNAVO_API_KEY"]

resp = requests.post(
    "https://api.kunavo.com/v1/video/generations",
    headers={"Authorization": f"Bearer {KEY}"},
    json={
        "model": "veo-3",
        "prompt": "A drone shot pulling back from a quiet mountain lake at dawn, mist rising off the water. Cinematic, 24fps, soft golden light.",
        "duration": 8,
        "aspect_ratio": "16:9",
        "resolution": "1080p",
    },
    timeout=600,   # generations take 30s to several minutes
)
resp.raise_for_status()
data = resp.json()
print(data["data"][0]["url"])

The response is OpenAI-style: { data: [{ url: '...' }] }. The URL is permanent, served from files.kunavo.com — download it once into your own storage if you need long-term hosting.

Image-to-video

Anchoring with an image is usually where you get production-quality results. Veo 3 supports two image modes:

  • image_mode: "frame" — single image is the first frame; two images is first + last frame. Default for image_url.
  • image_mode: "reference" — up to 3 style references for character / wardrobe consistency without forcing frames.
image_to_video.py
# image-to-video: pass an image_url to anchor the first frame.
resp = requests.post(
    "https://api.kunavo.com/v1/video/generations",
    headers={"Authorization": f"Bearer {KEY}"},
    json={
        "model": "veo-3",
        "prompt": "She smiles, then walks out of frame to the left",
        "image_url": "https://files.kunavo.com/<your-upload>.jpg",
        "image_mode": "frame",      # one image => first frame
        "duration": 6,
        "aspect_ratio": "9:16",     # vertical, mobile-native
    },
    timeout=600,
)
print(resp.json()["data"][0]["url"])

If you don't already have a public URL for your anchor image, post the bytes to /v1/files and Kunavo hosts the file for you under files.kunavo.com:

upload_anchor.py
# If you don't have a public URL, upload bytes; Kunavo hosts the file.
with open("anchor.jpg", "rb") as f:
    up = requests.post(
        "https://api.kunavo.com/v1/files",
        headers={"Authorization": f"Bearer {KEY}"},
        files={"file": f},
    )
image_url = up.json()["url"]   # permanent files.kunavo.com URL

Sora and other models

The same endpoint shape works for every video model in the catalog — pass the relevant model slug:

  • veo-3 — cinematic, 1080p, supports image-to-video.
  • sora-2 — OpenAI Sora.
  • seedance-1-5-pro, seedance-2-0-pro — ByteDance, very strong on character motion.

See /models for the live list and the per-second price on each.

From Node / TypeScript

sora.mjs
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KUNAVO_API_KEY,
  baseURL: "https://api.kunavo.com/v1",
});

// /v1/video/generations isn't in OpenAI's SDK shape, but the same auth
// header works — call it with fetch:
const resp = await fetch(
  "https://api.kunavo.com/v1/video/generations",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.KUNAVO_API_KEY}`,
    },
    body: JSON.stringify({
      model: "sora-2",
      prompt: "A red origami crane unfolding into a paper plane and flying away through a window",
      duration: 5,
      resolution: "1080p",
    }),
  },
);
const { data } = await resp.json();
console.log(data[0].url);

Pricing model

Video models bill per second of output, not per token. Kunavo publishes the per-second rate on /pricing for every video model. A common 8-second Veo 3 1080p clip costs a few cents. Failed generations (4xx / 5xx) are never billed.

Production checklist

  • Set a 10-minute HTTP timeout. The gateway polls upstream up to 540s, returning 504 if the model is still working past that. For very long jobs, retry — generations are idempotent per prompt.
  • Persist the result URL. Even though files.kunavo.com URLs are permanent, your product should own its own copy in the storage you control.
  • Handle 429s with backoff. Video models are GPU-bound; brief contention is normal. The retry-after header is honored when present.
  • Cache by prompt hash if reasonable. The same prompt + seed + model returns near-identical video — paying twice for it is wasteful.

Questions: contact@kunavo.com. The team behind the gateway reads every email and replies within 24 hours.