Anthropic's official Claude Opus 4.8 article card
Anthropic's official Claude Opus 4.8 article card
+ Anthropic News

Anthropic releases Claude Opus 4.8 with a reliability gain for agentic coding

Claude Opus 4.8 ships with one substantive improvement: roughly four times fewer self-introduced code flaws pass unflagged versus its predecessor. Pricing holds at 4.7 levels.

about 3 hours ago

Anthropic released Claude Opus 4.8 on May 28, 2026. It is available immediately via the API as claude-opus-4-8. The model’s one substantive improvement over its predecessor is reliability in agentic coding: Anthropic says it is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. Pricing is unchanged from Claude 4.7 — $5 per million input tokens and $25 per million output at standard rates, $10 and $50 in fast mode.

The reliability number is the story

For most model releases, the benchmark grid is the story. For Opus 4.8, one number stands above the rest. Anthropic’s claim is specific: the model is around four times less likely than its predecessor to let flaws in code it has written pass unremarked. That is not a claim about fewer bugs entering the codebase from outside, or about reasoning quality in general. It is about self-review — whether the model catches what it introduced.

In a single-shot completion, that difference rarely matters. In an agentic loop that writes code, calls tools, checks its own output, and iterates, a model that misses its own mistakes compounds those errors across steps. The fix has to come from the agent, not from a human watching every turn. That is the workload where catching self-introduced flaws is worth more than another benchmark point.

Anthropic also cites an 84% score on Online-Mind2Web and calls Opus 4.8 the first model to break 10% all-pass on the Legal Agent Benchmark.

~4× Fewer self-introduced code flaws passing unflagged vs. its predecessor, per Anthropic Anthropic
84% Online-Mind2Web Agentic web tasks benchmark Anthropic
$5 / $25 Price per million tokens (standard) Input / output — unchanged from 4.7 Anthropic

What else ships with 4.8

Beyond the reliability improvement, Opus 4.8 arrives with two additions to claude.ai that extend how the model works in practice.

Dynamic workflows let the model spin up parallel subagents and hand work off across them — a research preview, not a generally available feature. If you are building multi-step pipelines that currently run sequentially, this is the surface worth watching: structured parallel execution changes the economics of long research or code-generation jobs in ways that sequential tool use does not.

Effort control gives users a knob over how much reasoning the model applies to a given request. Spending compute on a hard problem and saving it on a routine one is the correct behavior for high-volume agentic traffic, and wiring that control in explicitly matters more than leaving it to the model’s defaults.

The counter-case: this is an incremental release

Opus 4.8 is not a generational step. The benchmark improvements are real but targeted, and the headline reliability figure applies to a specific behavior — self-review of self-written code — not to output quality across the board. Teams whose workloads are not agentic coding loops will see little practical difference from 4.7.

The benchmark numbers also come from Anthropic’s own post. Anthropic’s workload is not your workload. The responsible move, as with any model update, is to run representative tasks from your own pipeline against both versions before treating the upgrade as a free win.

Who should care and what to test

If your team runs agentic coding pipelines — the model writes code, executes it in a loop, reviews and patches — Opus 4.8 is worth a direct comparison against 4.7. The test is straightforward: give both versions a realistic coding task with a known set of self-introduced errors and check whether the model flags them in its own review pass. That is the behavior the release is built around.

If your work leans on legal research, multi-step document reasoning, or web-based agentic tasks, the Legal Agent Benchmark and Online-Mind2Web scores are the relevant signals. Neither benchmark is a guarantee on your data, but both point to where the model has been tuned.

Teams on standard API access should see no pricing change. Fast mode rates also hold. The practical upgrade path is simple: swap the model identifier to claude-opus-4-8, run your existing eval suite, and check whether the reliability improvement shows up in your tasks before committing.

For context on where Opus 4.8 sits against the wider field, see our AI model leaderboard. For Anthropic’s broader product and research direction, see our Anthropic company profile.

Sources

The AI Feed Desk

The AI Feed Desk

Editorial desk

The AI Feed Desk tracks AI provider updates, model releases, agent tooling, and enterprise adoption, turning fast-moving announcements into source-linked context for builders and operators.

Noticed a typo, incorrect information, or translation error?

Tell us so we can fix it.

Help Improve This Article

Related Articles

Gemini 3.5 Flash beats last year's Pro on the work builders ship

Google's Gemini 3.5 Flash beats last year's 3.1 Pro on coding and agentic benchmarks at ~40% lower cost — with reasoning and 1M-context limits worth testing.

The AI Feed Desk

By The AI Feed Desk

OpenAI puts o3 and GPT-4.5 on a ChatGPT sunset clock

OpenAI will retire GPT-4.5 from ChatGPT on June 27 and OpenAI o3 on August 26, with no API change. Teams should audit model-specific workflows now.

The AI Feed Desk

By The AI Feed Desk

Anthropic raises $65B at a $965B valuation

Anthropic's Series H pairs a $65B raise with $47B run-rate revenue and gigawatt-scale compute agreements. The money is for capacity, not just research.

The AI Feed Desk

By The AI Feed Desk

11 minutes ago

NVIDIA announces RTX Spark PCs for local AI agents

RTX Spark puts 1 petaflop of AI performance and up to 128GB of unified memory into Windows PCs designed for local agents.

The AI Feed Desk

By The AI Feed Desk

in 19 minutes

Google rolls out Gemini Omni Flash for video generation

Gemini Omni Flash turns mixed inputs into video and is rolling into Gemini, Flow, YouTube Shorts, and YouTube Create before the API arrives.

The AI Feed Desk

By The AI Feed Desk

in 4 minutes