AI agents aren’t replacing remote workers any time soon — here’s why
The demos look slick, the promises even slicker. In slides and keynotes, computer assistants plan, click and ship your work while you sip coffee. Promoters like McKinsey call it the agentic AI advantage.
Then you put these systems on real client work, and the wheels come off. The newest empirical benchmark from researchers at Scale AI and the Center for AI Safety finds current AI agents completing only a tiny fraction of jobs at a professional standard.
Headlines say “agents are here,” but the data say otherwise. The new Remote Labor Index, a multi-domain benchmark built from 240 real freelance-type projects across 23 categories, reports an automation rate topping out at 2.5 percent across leading agents. This means that almost all deliverables would be rejected by a reasonable client. The dataset spans design, operations, business intelligence, audio-video, game development, computer-aided design, architecture and more, reflecting the work that actually shows up in remote markets, not cherry-picked lab tasks.
It’s not that AI fails everywhere; the Remote Labor Index scattered wins in text-heavy data visualization, audio editing and simple image generation. But the failures are systematic. Reviewers cite empty or corrupt files, missing assets, low-grade visuals and inconsistencies across deliverables, the kinds of misses that doom work for clients. These aren’t close calls, either — inter-annotator agreement sits at 94.4 percent for the accept-or-reject decision.
If you need a concrete sense of difficulty, the benchmark’s human reference projects averaged 28.9 hours to complete, with a median of 11.5 hours and an average price of $632. Those are realistic project sizes. They include work like a World Happiness Report dashboard, a 2D promo for a tree services firm, 3D animations for new earbuds, an IEEE-formatted paper, an architectural concept for a container home, and a casual browser video game in the style of “Watermelon Game.” This is the right yardstick for agent claims.
When I work with companies on AI adoption, I push a simple framing: Use AI to do well-scoped tasks inside a project, but not to run the project. That rule aligns with the published evidence. The Remote Labor Index team notes pockets of success in content drafting, audio cleanup, image assets and basic data visualization, which pair nicely with human review in marketing, product and analytics teams. In my client work, this shows up as faster ad variants, cleaner query logic, quicker explainer scripts, and first-pass chart code that a developer can polish.
Contrast those gains with multi-hour, multi-file builds that require iterative verification. In METR’s HCAST findings, agents succeed in 70 to 80 percent of tasks humans do in under an hour, but under 20 percent on tasks that take humans more than four hours. That is the difference between automating a component and having computers carry a project across the finish line.
To be sure, AI really is improving, — what’s being overhyped is what that improvement means for near-term automation of whole projects.
Hype has a business model. The “agentic AI advantage” storyline promises proactive, goal-driven assistants that automate complex processes across firms. Markets respond to such bold claims, then teams inherit the risk. Advisory firm Gartner even warns that more than two out of five so-called agentic initiatives will be scrapped by 2027 due to unclear value and rising costs — a wave of “agent washing” where conventional tooling gets relabeled as autonomy.
The balanced plan is to redesign work so humans direct, verify and integrate AI outputs, then let evidence guide any increase in the scope of its work. OpenAI’s GDPval report shows that with human oversight, frontier models are approaching expert quality on carefully defined, economically valuable tasks. That supports staffing models where you automate slices of jobs but not the jobs themselves. It also matches early labor data. A recent Stanford employment analysis reports wage gains in AI-exposed roles and no broad, immediate job loss. This is consistent with a world where AI changes task-mix more than it wipes out occupations.
The near-term playbook is straightforward. Use AI to reduce cycle time on repeatable tasks. On current trend lines, more capable AI agents will arrive over the next few years, helped by tightly-scaffolded workflows and better tool-use. But AI is nowhere near capable of whole-project autonomy for general remote-capable work, regardless of hype from McKinsey and others.
Agentic AI is exciting, but real benchmarks are more important than glossy promises. The Remote Labor Index shows only tiny automation rates on the kinds of projects that companies actually pay for. Progress will continue, but the smart move is to treat AI as a force multiplier inside your project, while leaving humans accountable for outcomes. Leaders who adopt AI with discipline make gains today and be ready for tomorrow, without buying into an AI bubble.
Gleb Tsipursky, Ph.D., serves as the CEO of the hybrid work consultancy Disaster Avoidance Experts and authored the best-seller “Returning to the Office and Leading Hybrid and Remote Teams.”
Copyright 2025 Nexstar Media Inc. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.