Anthropic: Claude Opus 4.1

Anthropic: Claude Opus 4.1

Claude-Opus-4.1

Claude Opus 4.1 is the latest upgrade to Anthropic’s flagship model, delivering stronger performance in coding, reasoning, and agentic workflows. It reaches 74.5% on SWE-bench Verified and introduces major improvements in multi-file code refactoring, debugging accuracy, and fine-grained reasoning.

With extended thinking support up to 64K tokens, Opus 4.1 is well-suited for research, data analysis, and complex tool-assisted reasoning tasks, making it a powerful choice for advanced development and problem-solving.

Conversations

Download TXT
Download PDF

Creator Anthropic
Release Date August, 2025
License Proprietary
Context Window 200,000
Image Input Support Yes
Open Source (Weights) No
Input Cost $3/M tokens
Output Cost $15/M tokens

Qwen: Qwen3 30B A3B 2507

Qwen: Qwen3 30B A3B 2507

Qwen3-30B-A3B-2507

Qwen3 30B A3B 2507 Instruct is a 30.5B-parameter Mixture-of-Experts language model from the Qwen series, with 3.3B active parameters per inference. Operating in non-thinking mode, it is optimized for high-quality instruction following, multilingual comprehension, and agentic tool use. Trained further on instruction data, it delivers strong results across benchmarks in reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench). Compared to its base non-instruct variant, it performs significantly better on open-ended and subjective tasks while maintaining robust factual accuracy and coding capabilities.

Conversations

Download TXT
Download PDF

Creator Alibaba
Release Date July, 2025
License Apache 2.0
Context Window 262,144
Image Input Support No
Open Source (Weights) Yes
Parameters 30.5B, 3.3B active at inference time
Model Weights Click here

Performance

Deepseek-V3-0324 GPT-4o-0327 Gemini-2.5-Flash Non-Thinking Qwen3-235B-A22B Non-Thinking Qwen3-30B-A3B Non-Thinking Qwen3-30B-A3B-Instruct-2507
Knowledge
MMLU-Pro 81.2 79.8 81.1 75.2 69.1 78.4
MMLU-Redux 90.4 91.3 90.6 89.2 84.1 89.3
GPQA 68.4 66.9 78.3 62.9 54.8 70.4
SuperGPQA 57.3 51.0 54.6 48.2 42.2 53.4
Reasoning
AIME25 46.6 26.7 61.6 24.7 21.6 61.3
HMMT25 27.5 7.9 45.8 10.0 12.0 43.0
ZebraLogic 83.4 52.6 57.9 37.7 33.2 90.0
LiveBench 20241125 66.9 63.7 69.1 62.5 59.4 69.0
Coding
LiveCodeBench v6 (25.02-25.05) 45.2 35.8 40.1 32.9 29.0 43.2
MultiPL-E 82.2 82.7 77.7 79.3 74.6 83.8
Aider-Polyglot 55.1 45.3 44.0 59.6 24.4 35.6
Alignment
IFEval 82.3 83.9 84.3 83.2 83.7 84.7
Arena-Hard v2* 45.6 61.9 58.3 52.0 24.8 69.0
Creative Writing v3 81.6 84.9 84.6 80.4 68.1 86.0
WritingBench 74.5 75.5 80.5 77.0 72.2 85.5
Agent
BFCL-v3 64.7 66.5 66.1 68.0 58.6 65.1
TAU1-Retail 49.6 60.3# 65.2 65.2 38.3 59.1
TAU1-Airline 32.0 42.8# 48.0 32.0 18.0 40.0
TAU2-Retail 71.1 66.7# 64.3 64.9 31.6 57.0
TAU2-Airline 36.0 42.0# 42.5 36.0 18.0 38.0
TAU2-Telecom 34.0 29.8# 16.9 24.6 18.4 12.3
Multilingualism
MultiIF 66.5 70.4 69.4 70.2 70.8 67.9
MMLU-ProX 75.8 76.2 78.3 73.2 65.1 72.0
INCLUDE 80.1 82.1 83.8 75.6 67.8 71.9
PolyMATH 32.2 25.5 41.9 27.0 23.3 43.1

Qwen: Qwen3 235B A22B 2507

Qwen: Qwen3 235B A22B 2507

Qwen3-235B-A22B-2507

Qwen3 235B A22B 2507 Instruct is a multilingual, instruction-tuned Mixture-of-Experts model built on the Qwen3-235B architecture, activating 22B parameters per forward pass. It is optimized for versatile text generation tasks, including instruction following, logical reasoning, mathematics, coding, and tool use. The model supports a native 262K context window but does not include “thinking mode” (<think> blocks).

Compared to its base variant, this version offers substantial improvements in knowledge coverage, long-context reasoning, coding benchmarks, and open-ended alignment. It demonstrates particularly strong performance in multilingual understanding, mathematical reasoning (AIME, HMMT), and evaluation benchmarks such as Arena-Hard and WritingBench.

Conversations

Download TXT
Download PDF

Creator Alibaba
Release Date July, 2025
License Apache 2.0
Context Window 262,144
Image Input Support No
Open Source (Weights) Yes
Parameters 235B, 22.0B active at inference time
Model Weights Click here

Performance Benchmarks

Deepseek-V3-0324 GPT-4o-0327 Claude Opus 4 Non-thinking Kimi K2 Qwen3-235B-A22B Non-thinking Qwen3-235B-A22B-Instruct-2507
Knowledge
MMLU-Pro 81.2 79.8 86.6 81.1 75.2 83.0
MMLU-Redux 90.4 91.3 94.2 92.7 89.2 93.1
GPQA 68.4 66.9 74.9 75.1 62.9 77.5
SuperGPQA 57.3 51.0 56.5 57.2 48.2 62.6
SimpleQA 27.2 40.3 22.8 31.0 12.2 54.3
CSimpleQA 71.1 60.2 68.0 74.5 60.8 84.3
Reasoning
AIME25 46.6 26.7 33.9 49.5 24.7 70.3
HMMT25 27.5 7.9 15.9 38.8 10.0 55.4
ARC-AGI 9.0 8.8 30.3 13.3 4.3 41.8
ZebraLogic 83.4 52.6 89.0 37.7 95.0
LiveBench 20241125 66.9 63.7 74.6 76.4 62.5 75.4
Coding
LiveCodeBench v6 (25.02-25.05) 45.2 35.8 44.6 48.9 32.9 51.8
MultiPL-E 82.2 82.7 88.5 85.7 79.3 87.9
Aider-Polyglot 55.1 45.3 70.7 59.0 59.6 57.3
Alignment
IFEval 82.3 83.9 87.4 89.8 83.2 88.7
Arena-Hard v2* 45.6 61.9 51.5 66.1 52.0 79.2
Creative Writing v3 81.6 84.9 83.8 88.1 80.4 87.5
WritingBench 74.5 75.5 79.2 86.2 77.0 85.2
Agent
BFCL-v3 64.7 66.5 60.1 65.2 68.0 70.9
TAU1-Retail 49.6 60.3# 81.4 70.7 65.2 71.3
TAU1-Airline 32.0 42.8# 59.6 53.5 32.0 44.0
TAU2-Retail 71.1 66.7# 75.5 70.6 64.9 74.6
TAU2-Airline 36.0 42.0# 55.5 56.5 36.0 50.0
TAU2-Telecom 34.0 29.8# 45.2 65.8 24.6 32.5
Multilingualism
MultiIF 66.5 70.4 76.2 70.2 77.5
MMLU-ProX 75.8 76.2 74.5 73.2 79.4
INCLUDE 80.1 82.1 76.9 75.6 79.5
PolyMATH 32.2 25.5 30.0 44.8 27.0 50.2

xAI: Grok 4

xAI: Grok 4

Grok-4

Grok 4 is xAI’s latest reasoning model, featuring a 256K context window with support for parallel tool calling, structured outputs, and multimodal inputs (text and images). Unlike some models, its reasoning process is not exposed, cannot be disabled, and does not allow users to specify reasoning depth. Pricing tiers adjust once a request exceeds 128K total tokens.

Conversations

Download TXT
Download PDF

Creator xAI
Release Date July, 2025
License Proprietary
Context Window 256,000
Image Input Support Yes
Open Source (Weights) No
Input Cost $3/M tokens
Output Cost $15/M tokens

Explore More AI Models

OpenAI: gpt-oss-120b

OpenAI: gpt-oss-120b

OpenAI-gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) model from OpenAI, built for advanced reasoning, agentic behavior, and versatile production use cases. It activates 5.1B parameters per forward pass and is optimized to run efficiently on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool capabilities such as function calling, web browsing, and structured output generation.

Conversations

Download TXT
Download PDF

Creator OpenAI
Release Date August, 2025
License Apache 2.0
Context Window 131,000
Image Input Support No
Open Source (Weights) Yes
Parameters 117B, 5.1B active at inference time
Model Weights Click here

Z.AI: GLM 4.5 Air

Z.AI: GLM 4.5 Air

GLM-4.5-Air

GLM 4.5 Air is the lightweight variant of the GLM-4.5 flagship family, purpose-built for agent-focused applications. It retains the Mixture-of-Experts (MoE) architecture but with a more compact parameter size for efficiency. Like its larger counterpart, it supports hybrid inference modes—offering a “thinking mode” for advanced reasoning and tool use, and a “non-thinking mode” for fast, real-time interactions. Users can easily control reasoning behavior through a simple boolean toggle.

Conversations

Download TXT
Download PDF

Creator zAI
Release Date July, 2025
License MIT
Context Window 131,072
Image Input Support No
Open Source (Weights) Yes
Model Weights Click here

Explore More AI Models

Anthropic: Claude Sonnet 4

Anthropic: Claude Sonnet 4

Claude-Sonnet-4

Claude Sonnet 4 builds on the strengths of Sonnet 3.7, delivering major improvements in coding and reasoning with greater precision and controllability. It achieves state-of-the-art results on SWE-bench (72.7%), striking an effective balance between advanced capability and computational efficiency.

Key upgrades include better autonomous codebase navigation, lower error rates in agent-driven workflows, and stronger reliability in handling complex instructions. Optimized for real-world use, Sonnet 4 offers advanced reasoning power while remaining efficient and responsive across a wide range of coding, software development, and general-purpose tasks.

Conversations

Download TXT
Download PDF

Creator Anthropic
Release Date May, 2025
License Proprietary
Context Window 1,000,000
Image Input Support Yes
Open Source (Weights) No
Input Cost $15/M tokens
Output Cost $75/M tokens