Google Direct vs OpenRouter for Gemini: Same Price, Very Different Deal
Gemini 2.5 Flash costs $0.30/1M tokens on both. So why does Google direct win for solo builders? Free tier, Batch API at 50% off, and context caching that OpenRouter does not support.
You are adding Gemini to your SaaS. Someone tells you OpenRouter is easier. They are right that it is easier. They are wrong that it is cheaper. Here is the full breakdown so you can pick the right path for where you are right now.
- Gemini 2.5 Flash costs the same on Google and OpenRouter: $0.30 input / $2.50 output per million tokens.
- Google direct wins for MVP: Free tier covers the whole MVP. Batch API (50% off) and context caching (up to 90% off) are both unavailable on OpenRouter.
- OpenRouter wins for multi-model: GPT, Claude, Llama, and Gemini through one API key. Switch models without touching your code.
- If you are building solo and your app only needs Gemini, Google direct is always cheaper. Switch to OpenRouter when you need more than one provider.
The pricing, side by side
As of June 20, 2026, Gemini model pricing is identical between Google AI Studio and OpenRouter. OpenRouter is not marking up Gemini pricing.
| Model | Google Direct (input) | Google Direct (output) | OpenRouter (input) | OpenRouter (output) |
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | $0.10 | $0.40 | $0.10 | $0.40 |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.30 | $2.50 |
| Gemini 2.5 Pro | $1.25 | $10.00 | $1.25 | $10.00 |
| Prices per 1M tokens. Source: Google AI Studio pricing page and OpenRouter model page, June 2026. | ||||
The sticker price is the same. What differs is what sits around the price.
What is actually different
Advantage: Google Direct
Free tier
Google AI Studio includes a free tier for Gemini 2.5 Flash and Flash Lite. Rate limits apply (roughly 15 requests per minute, 1,500 per day on Flash), but for an MVP with low traffic this is enough to run the entire product at zero cost. OpenRouter has no free tier. Every request is a paid request.
For a solo builder validating product-market fit, the free tier is the entire early runway. You do not pay anything until you have paying users.
Advantage: Google Direct
Batch API: 50% off for non-urgent work
Google offers a Batch API that prices requests at 50% of the standard rate. The tradeoff is latency: batch jobs complete within 24 hours, not seconds. That is fine for any workload that is not user-facing and real-time: nightly re-scans, bulk classification, report generation, async enrichment pipelines.
OpenRouter does not offer batch pricing. Every request goes through the same endpoint at the same rate.
Advantage: Google Direct
Context caching: up to 90% off for long repeated contexts
If you send the same large context on every call (a long system prompt, a reference document, a large schema), Google's context caching stores it server-side and charges a fraction of the token price for cached hits. For apps with a large, stable system prompt that does not change between calls, this can cut input token costs by 75 to 90%.
OpenRouter does not support Google's context caching. You pay full input price on every call, every time.
Advantage: OpenRouter
One API for every model
OpenRouter uses the OpenAI API format and routes to over 200 models: GPT, Claude, Llama, Mistral, and all the Gemini variants. You change a single parameter to switch models. You manage one API key instead of one per provider. When a provider has an outage, you can reroute in seconds.
Google direct locks you to Gemini. If you want to add Claude or GPT alongside it, you need separate SDKs and separate API keys.
Advantage: OpenRouter
Automatic fallback routing
OpenRouter can automatically retry failed requests with a different provider. If Gemini is down, it can fall over to a configured backup model without you writing the fallback logic. Google direct requires you to handle provider failure yourself.
Full comparison
| Factor | Google Direct | OpenRouter |
|---|---|---|
| Sticker price | Same | Same |
| Free tier | Yes (Flash, Flash Lite) | No |
| Batch API (50% off) | Yes | No |
| Context caching (up to 90% off) | Yes | No |
| Model selection | Gemini only | 200+ models |
| API format | google-genai SDK | OpenAI-compatible |
| Automatic fallback | Build it yourself | Configurable |
| Outage dependency | Google only | OpenRouter + provider |
The decision
| Your situation | Use this | Why |
|---|---|---|
| MVP or early stage, solo builder | Google Direct | Free tier covers the MVP. No cost until you have paying users. |
| Production, Gemini only, cost-sensitive | Google Direct | Batch API and context caching are both unavailable on OpenRouter. The savings are real. |
| Need GPT, Claude, or Llama alongside Gemini | OpenRouter | One API key, one codebase, switch models in one line. |
| Need automatic provider fallback | OpenRouter | Fallback routing without custom code. |
| High-volume, Gemini-only, async workloads | Google Direct | Batch API at 50% off OpenRouter pricing is significant at scale. |
The SDK difference
The one practical friction with Google direct is the SDK. Google uses its own@google/genai SDK, which is not OpenAI-compatible. OpenRouter uses the OpenAI SDK format, so switching models means changing one string, not rewriting imports.
If you build on Google direct and later want to add OpenRouter, the migration is straightforward: replace the SDK import, point the base URL at OpenRouter, and update the model name format. The actual prompt and response handling code stays the same. This is a few hours of work, not a rebuild.
FAQ
Is OpenRouter marking up Gemini pricing?
As of June 2026, no. The per-token price for Gemini 2.5 Flash, Flash Lite, and Pro is identical on Google AI Studio and OpenRouter. OpenRouter makes money on other models where margins exist, not Gemini.
What is the Google Batch API and when does it apply?
The Batch API processes requests asynchronously, returning results within 24 hours at 50% of the standard price. It applies to any workload that is not user-facing and real-time: scheduled re-scans, bulk tagging, nightly enrichment. It does not apply to live user requests where you need a response in seconds.
How much does context caching actually save?
It depends on your cache hit rate and context size. Google charges roughly 25% of the input token price for cached tokens (the exact rate varies by model). If your system prompt is 10K tokens and you send it on every call, you would save roughly 75% of the input cost for that context once it is cached. For apps with a large, stable context, savings add up quickly at scale.
Can I start on Google direct and move to OpenRouter later?
Yes. The migration requires changing the SDK import, updating the base URL, and reformatting the model name string. Your prompt and response logic stays unchanged. Budget a few hours for the change plus testing.
Does OpenRouter support context caching at all?
OpenRouter supports prompt caching for some providers (Anthropic's prompt cache, for example) where the provider exposes it in their API. Google's context caching for Gemini is not available through OpenRouter as of June 2026.
What is the Gemini free tier rate limit?
As of June 2026, Gemini 2.5 Flash allows roughly 10 requests per minute and 500 requests per day on the free tier. Flash Lite limits are similar. These limits are enough for an MVP in early testing but will block you once you have real daily active users. At that point you upgrade to a paid API key and the rate limits increase significantly.
The model cost only matters if the feature has users who want it. Validate your AI SaaS idea with real demand data before you spend time on infrastructure decisions. Free, no setup required.
Validate your idea free →