In the first half of 2026 the frontier-model race stopped being a two-horse contest. Between February and June, nine serious models shipped — Anthropic’s Fable 5, Google’s Gemini 3.1 Pro, OpenAI’s GPT-5.5, xAI’s Grok 4.3, Alibaba’s Qwen 3.7 Max, DeepSeek V4, Z.ai’s GLM-5.2, Moonshot’s Kimi K2.6 and MiniMax M2.7 — and the gap between the best closed model and the best open-weight one narrowed to something a cost-conscious team in Kuala Lumpur can actually exploit.
This is the practical comparison we give clients who ask “which model should we build on?” — grouped by cost, performance, use case, and the honest pros and cons.
The 2026 frontier at a glance
| Model | Released | Price /1M (in / out) | SWE-Bench | Open weights | Best for |
|---|---|---|---|---|---|
| Fable 5 (Anthropic) | 9 Jun 2026 | $10 / $50 | ~80.3% · Pro | No | Hardest agentic coding, finance & research reasoning |
| Gemini 3.1 Pro (Google) | 19 Feb 2026 | $2 / $12 | 80.6% · Verified | No | All-round leader; multimodal, huge context, tops 13/16 benchmarks |
| GPT-5.5 (OpenAI) | 23 Apr 2026 | $5 / $30 | 58.6% · Pro | No | Broad general-purpose work, ecosystem & tooling |
| Grok 4.3 (xAI) | 30 Apr 2026 | $1.25 / $2.50 | ~78% · Verified* | No | Real-time X data, cheap agentic runs, 1M context |
| Qwen 3.7 Max (Alibaba) | 20 May 2026 | $2.50 / $7.50 | 60.6% · Pro | No | Best proprietary SWE-Bench Pro; long-context agents |
| DeepSeek V4 (DeepSeek) | 24 Apr 2026 | $0.44 / $0.87 | 80.6% · Verified | Yes (MIT) | Strongest open-weight; 1M context; very cheap |
| GLM-5.2 (Z.ai) | 2026 | $1.40 / $4.40 | Beats GPT-5.5 (long-horizon) | Yes (MIT) | Best value; self-host; long coding tasks |
| Kimi K2.6 (Moonshot) | 20 Apr 2026 | $0.60 / $2.50 | 58.6% · Pro | Yes (1T/32B) | Open-weight coding at a fraction of the cost |
| MiniMax M2.7 | 18 Mar 2026 | $0.30 / $1.20 | 56.2% · Pro | Yes (230B/10B) | Cheapest agentic workhorse; high-volume automation |
Fable 5 — the new ceiling, at a price
Anthropic released Fable 5 on 9 June 2026 as the first “Mythos-class” model, a tier above Opus 4.8. Independent testing put it around 80.3% on SWE-Bench Pro — roughly 11 points clear of the next model — and it tops finance and document-reasoning benchmarks too. Pros: best-in-class on the hardest agentic-coding and knowledge work. Cons: at $10 / $50 per million tokens it is the most expensive option here, so reserve it for the tasks where a wrong answer is costly.
Gemini 3.1 Pro — the all-round leader
Google shipped Gemini 3.1 Pro on 19 February 2026 at $2 / $12. It scores 80.6% on SWE-Bench Verified and tops 13 of 16 major benchmarks (94.3% GPQA Diamond, 95.1% MATH). Pros: the strongest all-rounder — native multimodal, a very large context window, and deep Google Workspace and Vertex AI integration. Cons: still labelled “preview” with no confirmed GA date, and the Verified score isn’t directly comparable to the harder Pro benchmark the Anthropic and Chinese models quote.
GPT-5.5 — the safe default
OpenAI shipped GPT-5.5 on 23 April 2026 at $5 / $30, roughly double GPT-5.4’s output price. It scores 58.6% on SWE-Bench Pro. Pros: the broadest ecosystem, tooling and integrations, and strong general-purpose performance. Cons: it no longer leads agentic coding — Anthropic’s top models beat it there — and the price rose sharply.
DeepSeek V4 — the open-weight heavyweight
DeepSeek released V4 on 24 April 2026 under an MIT licence — a 1.6-trillion-parameter mixture-of-experts (49B active) with a 1M-token context. V4-Pro-Max scores 80.6% on SWE-Bench Verified, the highest of any open-weight model, at just $0.44 / $0.87 (permanent pricing since 22 May 2026). Pros: frontier-class results, open weights on Hugging Face, and a price that undercuts every closed model here. Cons: you carry the hosting and governance if you self-host, and some enterprises apply extra scrutiny to China-origin models (the US CAISI/NIST evaluation is worth reading).
GLM-5.2 — the value champion
Z.ai’s GLM-5.2 is the headline for cost-sensitive teams: open-weights under an MIT licence, priced around $1.40 / $4.40, and, per VentureBeat, beating GPT-5.5 on several long-horizon coding benchmarks “for one-sixth the cost.” It was trained on Huawei Ascend chips, not NVIDIA. Pros: frontier-adjacent coding at a fraction of the price; you can self-host. Cons: smaller support ecosystem and you own the ops if you self-host.
Kimi K2.6 & MiniMax M2.7 — open-weight and cheap
Moonshot’s Kimi K2.6 (20 April 2026) is a 1-trillion-parameter open-weight model with a 256K context window at $0.60 / $2.50, and it ties GPT-5.5 on SWE-Bench Pro (58.6%). MiniMax M2.7 (18 March 2026) is the budget agentic workhorse at $0.30 / $1.20 — 230B total but only 10B active parameters, so it is fast and dirt-cheap for high-volume automation. Both trade a little peak quality for enormous cost savings.
Qwen 3.7 Max & Grok 4.3 — the challengers
Alibaba’s Qwen 3.7 Max (20 May 2026, $2.50 / $7.50) posts 60.6% on SWE-Bench Pro — the highest proprietary score on that harder benchmark, edging out GPT-5.5 — with a 1M-token context and native extended thinking. It turned closed-weight, unlike Alibaba’s open-model past. xAI’s Grok 4.3 (30 April 2026, $1.25 / $2.50) is the value play with real-time access to X data and a 1M context; xAI hasn’t published a SWE-Bench figure for it, but the predecessor Grok 4.20 scored ~78% Verified. Pros: both are cheap for their tier. Cons: Qwen’s pricing is mid-pack and Grok’s coding is a notch below the leaders.
So which one should you build on?
Use a tiered approach, not a single model. Route the 90% of routine calls — classification, extraction, drafting — to a cheap open-weight model (MiniMax M2.7, Kimi K2.6 or DeepSeek V4). Send the hard 10% — multi-step agentic coding, high-stakes reasoning — to a frontier model (Fable 5, Gemini 3.1 Pro or GPT-5.5). For Malaysian SMEs watching the ringgit, DeepSeek V4, GLM-5.2 or Kimi K2.6 gets you most of the quality at a fraction of the token bill. This is the same model-selection discipline we teach in our AI Engineering programme — HRDC SBL-KHAS claimable for eligible Malaysian employers.
If you’re still choosing between the big providers, our Claude vs ChatGPT vs Gemini and reasoning-models guides go deeper on the trade-offs.