google

Google Fights Back: Everything That Has Changed With Gemini's Compute-Based Usage Limits

Image of Gemini logo.

Google's Gemini just had one of its most turbulent weeks in recent memory. What started as a quiet but significant policy shift at Google I/O 2026 quickly escalated into a full-blown revolt from paying subscribers, forcing the company to respond with a rapid series of fixes, clarifications, and concessions. If you have been trying to keep track of exactly what changed, what Google fixed, and what questions are still unanswered, this article breaks it all down.

From Daily Prompts to Compute: What Google Changed and Why

For a long time, Gemini operated on a model that was simple to understand: you had a fixed number of prompts you could send per day. You could plan around that. You knew where you stood. That era is now over.

Effective May 17, 2026, Google overhauled Gemini by replacing daily usage limits with compute-based quotas, where usage is calculated in real time based on prompt complexity, features used, and conversation length. The announcement was made at Google I/O 2026 and went into effect almost immediately after the conference.

Under the new system, Google dynamically measures the exact amount of processing power your interaction consumes. A simple prompt like "Draft an email" carries a minimal compute cost and barely affects your quota. A complex prompt involving massive file data carries a high compute cost and can rapidly drain your quota.

Gemini usage is now capped within rolling five-hour windows, alongside a broader weekly usage limit. Each five-hour window resets your short-term limit, but each five-hour window you max out also counts against your weekly cap. So if you have a couple of days of heavy usage back-to-back, you can easily find yourself locked out for the rest of the week.

Google has tiered the limits across its subscription plans. Users on the $8-a-month AI Plus plan get usage limits that are twice as high as the standard limits for free users. The $20-a-month AI Pro plan offers four times the standard limits, while the $200-a-month AI Ultra plan offers 20 times the standard limits.

On the surface, the logic makes sense. AI compute is genuinely expensive, especially as users lean into features like video generation, agentic coding, and long-context reasoning. Google argued that a compute-based model allows it to allocate server resources more efficiently, and that simple text prompts would barely register on the meter. The problem, as subscribers discovered almost immediately, is that the calibration was nowhere near ready for real-world use.

The Backlash: When Theory Meets Reality

The complaints started flooding in almost the moment the new limits went live. Developers, power users, and even casual subscribers found themselves hitting the five-hour cap far faster than anyone expected.

Paying customers on Gemini Pro and Ultra plans reported that a handful of prompts or a few video generations were entirely depleting their quotas, locking them out of the service for up to five hours. One frustrated user posted on Reddit that a simple five-post back-and-forth burned through 50 percent of their entire five-hour compute limit.

The complaints took a sharper turn when video generation entered the picture. One failed avatar-video request reported by a user on X exhausted his entire five-hour usage window, turning a subscription update into a direct product problem for Google AI Pro subscribers. The particularly infuriating part? The video failed to generate successfully. The compute was consumed, the quota was gone, and there was nothing to show for it.

Users now found themselves having to think before they prompted: Should they start a fresh chat? Should they avoid video? Should they skip extended thinking and use a lighter model? Should they save the big task for later? As one commentary put it, Gemini was now asking ordinary users to think like cloud-cost managers. That is a very different product experience from the one people had signed up for.

Intense coding sessions proved especially brutal, consuming huge volumes of computational data and burning through weekly quotas within minutes. Google AI Pro and Ultra subscribers took to developer forums to voice their frustration, and the backlash was loud enough that Google responded quickly.

Google's Response: A Series of Fixes and Clarifications

To Google's credit, the response came fast. Gemini lead and Vice President Josh Woodward took to X to publicly acknowledge the problems and roll out a set of changes. Here is a breakdown of every adjustment that was made.

Single-Prompt Quota Cap for Gemini 3.1 Pro

One of the most significant and immediate fixes was placing a ceiling on how much quota a single prompt can consume. Google said it is placing a ceiling on the amount of quota any single prompt can consume when using Gemini 3.1 Pro, a response to users finding that complex requests with large files rapidly drained their allowances.

According to Josh Woodward, the company is now "capping the amount of quota a single prompt can use so you get more out of the Pro model." The prompt still runs as usual, meaning you get the full output. It is just that the compute charge gets clipped so it cannot single-handedly tank your entire multi-hour allowance. This is a meaningful fix for developers working with large codebases, researchers uploading lengthy documents, or anyone regularly sending complex, context-heavy prompts.

Failed Requests No Longer Count Against Your Quota

This was arguably the fix that resonated most emotionally with users. The idea that a failed generation would eat into your paid quota was genuinely unfair, and Google acknowledged it.

Google VP Josh Woodward noted that system errors are on Google, not on users. Quota will now be used only for successful completions.

Woodward noted that about 1 in 10 requests can fail due to system errors. Earlier, even failed attempts could still count against your quota, which understandably felt unfair. That is now being corrected. So if Gemini crashes mid-response, times out, or encounters any server-side error, that attempt no longer touches your limit. This is a basic fairness principle that should have been in place from the start, but better late than never.

The Omni Video Bug Is Fixed

One of the most dramatic complaints came from users generating videos with Gemini Omni, Google's multimodal world model. A bug was causing one or two Omni videos to eat up the entire quota. That bug has been fixed, and Ultra members now get double the number of Omni video generations.

This fix addresses the particularly egregious situation where a user testing short video clips could suddenly find their entire allowance depleted after just a couple of attempts. The doubling of Omni video generations for Ultra subscribers is also a tangible acknowledgment that the original limits were too restrictive.

Gemini 3.1 Flash-Lite Is Now Free

In a move that functions both as a fix and a safety net, Gemini 3.1 Flash-Lite prompts are now "free and won't count against your quota."

This effectively turns Flash-Lite into a free layer for lighter tasks. It also subtly encourages users to rely on lighter models when they do not need full reasoning power, which should help stretch the limits of higher tiers further. In practical terms, if you hit your cap mid-session, you can continue working with Flash-Lite while waiting for the five-hour window to reset. It is a meaningful quality-of-life addition that softens the wall users hit when they exhaust their quota.

More Detailed Usage Breakdowns Are Coming

One of the recurring complaints beyond the limits themselves was the lack of visibility into where the compute was going. The current usage dashboard at gemini.google.com/usage only offers a broad overview.

Because tasks like Deep Research draw heavily on compute resources, Google plans to surface richer tracking information and notifications so subscribers can better manage their budgets. The promise is that future updates will include clearer per-feature breakdowns and proactive notifications before you hit a wall, not just after.

This is important because, right now, users are essentially flying blind. You know your quota is going down, but you often do not know why. A user who just ran a Deep Research session followed by a video generation has no easy way to see which task cost what. Google's commitment to building more transparency into the system is the right call, and it cannot come soon enough.

Quota Was Also Tripled

Beyond the specific fixes, Google took broader action to ease the immediate pressure. Google notified paid users that their Gemini quota had been reset for the week and increased by 3x moving forward. Mohan later confirmed on X that the 3x usage limit for Gemini Pro models was permanent. This wide-scale increase suggests the original limits were set far too conservatively and that Google underestimated just how intensively its paid subscribers use the service.

The Unanswered Question: What Happens to Your Chats When You Cancel?

While Google has been relatively forthcoming about the fixes to the compute-based system, there is one pressing question that remains murky: what exactly happens to your chat history if you cancel your subscription?

This matters more now than it did before. With the new compute-based limits driving some users to question whether the subscription is worth it, a meaningful number of people are considering cancelling. And there are reports that when some users cancelled their plans after usage-based limits were introduced, they could no longer access their old chats. They could create new chats on the free tier, but their previous conversation history was effectively walled off behind the paywall.

This is a real concern, and it has precedent in how AI products often handle subscription changes. Unlike simple text files you own on your own device, your Gemini conversations exist inside Google's infrastructure. The access policy for that content is not something Google has been explicit about in the context of this transition.

The issue is compounded by the fact that chats older than 90 days may be unrecoverable, and once cancelled, scheduled exports cease, with chats older than 30 days potentially being purged from backup systems. Google does offer a data export option through Settings, Data and Privacy, but that requires users to act proactively before cancelling, which many will not think to do.

Google should address this clearly and directly. Users who have been paying subscribers deserve to know what happens to the work they have done inside the platform. If old chats become inaccessible upon cancellation, that is something subscribers need to know before they make the decision to leave. It should not be buried in fine print or discovered only after cancelling.

The Bigger Picture: Is Compute-Based Pricing Here to Stay?

The turmoil around Gemini's new limits points to a broader tension running through the entire AI subscription industry. The first major company to begin enforcing this kind of structure was Anthropic, which rolled out five-hour windows feeding into weekly caps for Claude back in August 2025. Google has essentially adopted the same model nine months later.

The reason is not hard to understand. AI features are getting dramatically more powerful and resource-intensive. Video generation, agentic coding, deep research across massive document sets, and long-context reasoning all consume serious compute. Flat-rate subscriptions, however simple they are for users, create a situation where a small number of heavy users can consume a disproportionate amount of infrastructure capacity at everyone else's expense.

The challenge for Google and every other AI company is communicating this shift in a way that feels fair. Not everyone is thrilled. Some users feel they are getting less value now, even though Google raised usage limits by 3x permanently. That feeling is understandable. When you move from "unlimited prompts" to "a compute budget you can exhaust in an afternoon," the product feels more constrained, even if the underlying reason is rational.

The additions of a per-prompt quota cap, free Flash-Lite access, and pay-as-you-go top-up credits coming in the future are all steps in the right direction. The broader Gemini overhaul from I/O 2026 brought a lot of new capabilities at once, and the limits seem to have been an afterthought. That is not a flattering look, but the rapid course correction does suggest Google is paying attention.

What You Should Do Right Now

If you are a Gemini subscriber trying to navigate this new reality, here are the most practical takeaways:

Use Flash-Lite for lightweight tasks like drafting short texts, quick factual questions, and simple summaries. It is now free and does not touch your quota. Save the heavier Gemini 3.1 Pro usage for tasks that genuinely need it.

Start fresh chats when context length is not critical. Longer conversations cost more compute. A new chat with a brief recap of what you need is often more quota-efficient than continuing a very long thread.

Keep an eye on the usage dashboard at gemini.google.com/usage, even though it is still limited. More detailed breakdowns are coming, and once they arrive, they will help you understand where your compute is actually going.

If you are considering cancelling your subscription, export your chat history first. Go to Settings, then Data and Privacy, and download your data before making any changes to your plan. Do not assume your conversations will be waiting for you if you decide to return.

Software Staff Writer,Sandra Safari serves a unique dual role at TechInKenya as both a Software Engineer and a Tech Journalist. Operating at the intersection of infrastructure engineering and media, s...see full bio

Comments

to join the discussion.