Qwen: Qwen3 235B A22B 2507

Qwen3-235B-A22B-2507

Qwen3 235B A22B 2507 Instruct is a multilingual, instruction-tuned Mixture-of-Experts model built on the Qwen3-235B architecture, activating 22B parameters per forward pass. It is optimized for versatile text generation tasks, including instruction following, logical reasoning, mathematics, coding, and tool use. The model supports a native 262K context window but does not include “thinking mode” (<think> blocks).

Compared to its base variant, this version offers substantial improvements in knowledge coverage, long-context reasoning, coding benchmarks, and open-ended alignment. It demonstrates particularly strong performance in multilingual understanding, mathematical reasoning (AIME, HMMT), and evaluation benchmarks such as Arena-Hard and WritingBench.

Conversations

Download TXT
Download PDF

Creator Alibaba
Release Date July, 2025
License Apache 2.0
Context Window 262,144
Image Input Support No
Open Source (Weights) Yes
Parameters 235B, 22.0B active at inference time
Model Weights Click here

Performance Benchmarks

Deepseek-V3-0324 GPT-4o-0327 Claude Opus 4 Non-thinking Kimi K2 Qwen3-235B-A22B Non-thinking Qwen3-235B-A22B-Instruct-2507
Knowledge
MMLU-Pro 81.2 79.8 86.6 81.1 75.2 83.0
MMLU-Redux 90.4 91.3 94.2 92.7 89.2 93.1
GPQA 68.4 66.9 74.9 75.1 62.9 77.5
SuperGPQA 57.3 51.0 56.5 57.2 48.2 62.6
SimpleQA 27.2 40.3 22.8 31.0 12.2 54.3
CSimpleQA 71.1 60.2 68.0 74.5 60.8 84.3
Reasoning
AIME25 46.6 26.7 33.9 49.5 24.7 70.3
HMMT25 27.5 7.9 15.9 38.8 10.0 55.4
ARC-AGI 9.0 8.8 30.3 13.3 4.3 41.8
ZebraLogic 83.4 52.6 89.0 37.7 95.0
LiveBench 20241125 66.9 63.7 74.6 76.4 62.5 75.4
Coding
LiveCodeBench v6 (25.02-25.05) 45.2 35.8 44.6 48.9 32.9 51.8
MultiPL-E 82.2 82.7 88.5 85.7 79.3 87.9
Aider-Polyglot 55.1 45.3 70.7 59.0 59.6 57.3
Alignment
IFEval 82.3 83.9 87.4 89.8 83.2 88.7
Arena-Hard v2* 45.6 61.9 51.5 66.1 52.0 79.2
Creative Writing v3 81.6 84.9 83.8 88.1 80.4 87.5
WritingBench 74.5 75.5 79.2 86.2 77.0 85.2
Agent
BFCL-v3 64.7 66.5 60.1 65.2 68.0 70.9
TAU1-Retail 49.6 60.3# 81.4 70.7 65.2 71.3
TAU1-Airline 32.0 42.8# 59.6 53.5 32.0 44.0
TAU2-Retail 71.1 66.7# 75.5 70.6 64.9 74.6
TAU2-Airline 36.0 42.0# 55.5 56.5 36.0 50.0
TAU2-Telecom 34.0 29.8# 45.2 65.8 24.6 32.5
Multilingualism
MultiIF 66.5 70.4 76.2 70.2 77.5
MMLU-ProX 75.8 76.2 74.5 73.2 79.4
INCLUDE 80.1 82.1 76.9 75.6 79.5
PolyMATH 32.2 25.5 30.0 44.8 27.0 50.2

Leave a Reply

Your email address will not be published. Required fields are marked *