Claude Opus 4.7 Launch: SWE-bench 87.6%, Beats GPT-5.4 and Gemini (2026)

4월 17, 2026

▲ Claude Opus 4.7 official launch, April 16, 2026

Claude Opus 4.7 is Anthropic's newest flagship AI model, officially released on April 16, 2026, with sharper coding, stronger reasoning, and the first real high-resolution vision support in the Claude family. For developers and teams running AI workloads, the upgrade changes both what the model can do and how much it costs to run at scale.

Why Claude Opus 4.7 matters right now

The release arrives roughly five months after Opus 4.6. Through March, OpenAI's GPT-5.4 held the lead on several coding benchmarks. Anthropic's April 16 shipment of Claude Opus 4.7 flips that ordering: the company reclaims the top slot among generally available models. An internal model called Mythos remains in limited preview and outperforms 4.7, but for builders using public APIs, Opus 4.7 is the new ceiling.

▲ SWE-bench 87.6% and GPQA 94.2% highlights

How big is the jump over Opus 4.6?

SWE-bench Verified moved from 80.8% to 87.6%, a +6.8 point gain. The harder SWE-bench Pro climbed from 53.4% to 64.3%, up +10.9 points. PhD-level GPQA Diamond reached 94.2%, and tool-use scores on MCP-Atlas jumped +14.6 points to 77.3%. Image input grew from 1.15MP to 3.75MP, about 3.3x the resolution. Anthropic also reports Opus 4.7 solves up to 3x more of the hardest bugs than 4.6.

How does Opus 4.7 compare with GPT-5.4 and Gemini 3.1 Pro?

On SWE-bench Pro, Claude Opus 4.7 leads at 64.3%, ahead of GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%). GPQA Diamond is essentially a three-way tie: 94.2% for Opus 4.7, 94.4% for GPT-5.4 Pro, 94.3% for Gemini. The biggest gap shows up on MCP-Atlas tool use, where Opus 4.7 scores 77.3% versus Gemini's 73.9% and GPT's 68.1%. For agentic workflows that call functions and external tools, 4.7 offers a meaningful edge.

▲ Head-to-head vs GPT-5.4 and Gemini 3.1 Pro

Will upgrading from 4.6 to 4.7 cost more tokens?

The sticker price is unchanged: $5 per 1M input tokens, $25 per 1M output tokens, identical to 4.6. What changed is the tokenizer. Anthropic states the same text can now count as roughly 1.0 to 1.35x more tokens. For typical English workloads, expect a +5-15% real-cost bump. For code, structured data, or non-English text like Korean or Japanese, the real cost can climb up to +35%. Before a full migration, replay production traffic through 4.7 and measure token counts directly. A common compromise is keeping cost-sensitive batch work on Sonnet 4.6 and reserving Opus 4.7 for harder agentic and coding tasks.

Key Takeaways

1. Crown reclaimed - SWE-bench 87.6% pushes Opus 4.7 past GPT-5.4 and Gemini 3.1 Pro in public benchmarks.

2. Agentic leap - MCP-Atlas tool use at 77.3% (+14.6 pts) makes 4.7 the strongest public choice for agent workflows.

3. Hidden cost - Same price tag, but a new tokenizer can push real bills up to +35% on code and non-English text.

Claude Opus 4.7 is not just a version bump. It resets the competitive order among generally available models and forces teams to rethink cost models that assumed pricing parity with 4.6. The migration plan you build this week will decide how much of that performance win actually reaches your bottom line.

👉 Anthropic Claude for Word - AI Sidebar Add-in Launches for Team and Enterprise (2026) is worth a read too.

📌 Sources: Anthropic, VentureBeat, Vellum AI, LLM-Stats, Finout (2026)

이 블로그 검색

Tech News by InClicks