AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

Published May 28, 2026

Research

This post describes work by Dzung Pham, Kleomenis Katevas, Ali Shahin Shamsabadi, and Hamed Haddadi, from Brave.

Summary

Running LLM-based agents locally protects privacy—your prompts and reasoning traces never leave your device and stay out of cloud logs. But there’s a hidden cost that rarely gets talked about: local AI agents, regardless of the size of their LLMs, drain battery fast by sometimes wasting enormous amounts of LLM inference, reasoning, and tool calls that never lead anywhere. This matters because it creates frustration and in many cases even anxiety to the users. To address this, we designed AgentStop, a lightweight efficiency supervisor that monitors the agent’s LLM backend in real time, predicts when the agent is heading toward wasted computation, and terminates unpromising inference chains before they drain your battery.

AgentStop is accepted at 1st ACM Conference on AI and Agentic Systems (ACM CAIS 2026), comes with a fully open-source implementation, and has been awarded with three reproducibility badges by the Artifact Evaluation Committee: Artifact Available, Artifact Functional, and Results Reproduced. AgentStop will be presented at the conference between May 27th and 29th in San Jose, California.

Local AI Agents are Necessary for Privacy

Imagine asking an AI agent to fix a bug in your codebase. If the agent is running in the cloud, your entire codebase leaves your machine and is sent to a third-party server. Local AI agents reduce this privacy risk by keeping inference and data processing on your own device, so sensitive files no longer need to be uploaded to an external infrastructure. Moreover, they eliminate API costs and reduce dependence on an internet connection.

Recent advances in model efficiency, including 4-bit quantization and Mixture-of-Experts (MoE) architectures ¹, have made local deployment far more practical, enabling 30-billion-parameter models to run on consumer hardware such as a 24GB laptop. Truly capable on-device agents are no longer a distant dream. However, running those agents comes at a cost that is easy to overlook until you’re staring at a 20% battery warning.

But Local AI Agents Are Expensive to Run

especially on mobile devices where battery anxiety is already a real concern

LLM-based agents differ fundamentally from simple LLM chats in how they consume resources. Agents operate through multi-step cycles in which each step requires a new inference: reasoning, tool calling, taking actions, observing outcomes, and reasoning again. This iterative process means agentic workloads consume vastly more resources. In addition to this, a significant portion of that compute is spent on steps that were never likely to succeed.

This cycle of inference, tool use, and retry loops means that agentic workloads consume vastly more compute than a simple LLM chat interaction. And a significant portion of that compute is spent on runs that were never going to succeed.

Figure 1: Power draw and temperature across a single coding task on an Apple M1 Max, powered by Qwen3-Coder-30B-A3B. Each power spike corresponds to a new LLM inference call; the sustained thermal load above 90°C reflects the cumulative cost of 30+ calls over roughly 10 minutes, much of it potentially wasted on a task the agent will never complete.

While testing these interactions on a MacBook Pro M1 Max (as shown in Figure 1), a single coding task could:

Last over 10 minutes
Trigger 30+ separate LLM inference calls
Push GPU power draw past 40 watts
Sustain GPU temperatures above 90°C for extended periods

That’s not a quick query, that’s your laptop working as hard as possible, for ten minutes straight, on a task that may ultimately fail.

For context, a single failed coding task can drain roughly 3% of a laptop’s entire battery. That might sound small, but consider what it means in practice. If you’re running an agent to help debug a complex piece of software, it might fail five or ten times before finding a working solution, or give up entirely. Those failed attempts alone could cost 15-30% of your battery, before you’ve even seen a useful result.

Web-based question answering is lighter, but the same principle applies: fail ten web search tasks in a row, and you’ve burned through another 3-7% of battery for nothing.

That adds up fast, especially on mobile devices where battery anxiety is already a well-documented, real psychological concern. Research has shown that low battery levels are a meaningful source of stress for many users, a phenomenon sometimes called nomophobia ², and that this anxiety directly affects how people use and trust their devices ³.

AgentStop: Brave’s First Step Towards Building An Efficiency Supervisor for Agents

AgentStop is a lightweight efficiency supervisor that watches an agent as it works and predicts, early on, whether it’s likely to succeed. If the outlook looks bleak, it pulls the plug before the agent wastes any more energy.

The key insight is that agents often signal their own failure without realizing it. By monitoring subtle patterns in the agent’s output, AgentStop can spot a doomed run within the first few steps.

The features it watches include:

Token log-probabilities: a measure of how “confident” the model is in each word it generates. Lower confidence often correlates with a struggling agent.
Token counts per step: unusually long chains of reasoning can indicate the agent is going in circles.
Token overlap between steps: if the agent keeps repeating itself, it’s probably stuck in a loop.

These signals are already produced during normal inference, so collecting them adds virtually zero extra energy cost.

AgentStop trains a gradient-boosted decision tree (using XGBoost) on a labelled dataset of successful and failed agent runs. The model is deliberately lightweight as each inference costs less than 0.01 mWh, so the supervisor doesn’t undo its own savings.

When deployed, a single classifier runs once after each agent step and returns a simple verdict: keep going or stop now.

AgentStop reduces energy consumption at minimal utility cost

We evaluated AgentStop across two representative tasks:

Web-Based Question Answering

We tested on two datasets: FRAMES (824 multi-hop reasoning questions) and SimpleQA (4,326 factual questions), with agents powered by Qwen3-30B-A3B and given access to web search via the Brave Search API.

Dataset	Exit Step Number	Energy Wastage Reduction	Task Utility Drop
FRAMES	5	~22%	<2%
SimpleQA	4	~23%	<2%

On both datasets, AgentStop outperforms simpler baselines (random stopping, min log-prob thresholding, mean log-prob thresholding), particularly in the critical early steps where intervention is most valuable.

Coding

Coding is a much harder problem. Our agent, powered by Qwen3-Coder-30B-A3B, achieves an 18.8% success rate on the 500-task SWE-Bench Verified benchmark, competitive with GPT-4o’s 21.2% in the same setting. Failed runs are expensive: a single failed coding attempt can cost nearly 3,000 mWh, roughly 3% of a 100Wh laptop battery.

Dataset	Exit Step Number	Energy Wastage Reduction	Task Utility Drop
SWE-Bench Verified	5	~19%	~3%

Notably, around 60% of total energy consumption occurs within the first 10 agent steps, meaning early intervention has an outsized impact. AgentStop achieves 0.6-0.7 AUC in classifying success vs. failure within those first 10 steps, which is where it matters most.

Across both tasks, the results are consistent: AgentStop recovers 15-20% of wasted energy with less than a 5% drop in task completion, using a supervisor that costs almost nothing to run.

As local AI agents become more capable and autonomous, efficiency will matter just as much as intelligence. Local AI agents already protect your privacy, eliminate API costs, and reduce dependence on an internet connection; now, AgentStop is an early step toward making on-device agents not only private and useful, but also energy-aware.

The code and datasets for AgentStop are available at https://github.com/brave-experiments/AgentStop.

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

Summary

Local AI Agents are Necessary for Privacy

But Local AI Agents Are Expensive to Run

especially on mobile devices where battery anxiety is already a real concern

AgentStop: Brave’s First Step Towards Building An Efficiency Supervisor for Agents

AgentStop reduces energy consumption at minimal utility cost

Web-Based Question Answering

Coding

Related articles

Privacy risks of agentic oversharing on the Web

zkLogin: when ZKP is not enough

Coral: Bridging Parsing and Zero-Knowledge Proofs

Almost there…

Please continue the installation of Brave in the Google Play app.

Please continue the installation of Brave in the App Store.

You’re just 60 seconds away from the best privacy online

Download Brave

Run the installer

Import settings

Download Brave

Run the installer

Import settings