Qwen: Qwen3 235B A22B 2507

Qwen3 235B A22B 2507 Instruct is a multilingual, instruction-tuned Mixture-of-Experts model built on the Qwen3-235B architecture, activating 22B parameters per forward pass. It is optimized for versatile text generation tasks, including instruction following, logical reasoning, mathematics, coding, and tool use. The model supports a native 262K context window but does not include “thinking mode” (<think> blocks).

Compared to its base variant, this version offers substantial improvements in knowledge coverage, long-context reasoning, coding benchmarks, and open-ended alignment. It demonstrates particularly strong performance in multilingual understanding, mathematical reasoning (AIME, HMMT), and evaluation benchmarks such as Arena-Hard and WritingBench.

Conversations

Download TXT

Download PDF

Creator	Alibaba
Release Date	July, 2025
License	Apache 2.0
Context Window	262,144
Image Input Support	No
Open Source (Weights)	Yes
Parameters	235B, 22.0B active at inference time
Model Weights	Click here

Performance Benchmarks

	Deepseek-V3-0324	GPT-4o-0327	Claude Opus 4 Non-thinking	Kimi K2	Qwen3-235B-A22B Non-thinking	Qwen3-235B-A22B-Instruct-2507
Knowledge
MMLU-Pro	81.2	79.8	86.6	81.1	75.2	83.0
MMLU-Redux	90.4	91.3	94.2	92.7	89.2	93.1
GPQA	68.4	66.9	74.9	75.1	62.9	77.5
SuperGPQA	57.3	51.0	56.5	57.2	48.2	62.6
SimpleQA	27.2	40.3	22.8	31.0	12.2	54.3
CSimpleQA	71.1	60.2	68.0	74.5	60.8	84.3
Reasoning
AIME25	46.6	26.7	33.9	49.5	24.7	70.3
HMMT25	27.5	7.9	15.9	38.8	10.0	55.4
ARC-AGI	9.0	8.8	30.3	13.3	4.3	41.8
ZebraLogic	83.4	52.6	–	89.0	37.7	95.0
LiveBench 20241125	66.9	63.7	74.6	76.4	62.5	75.4
Coding
LiveCodeBench v6 (25.02-25.05)	45.2	35.8	44.6	48.9	32.9	51.8
MultiPL-E	82.2	82.7	88.5	85.7	79.3	87.9
Aider-Polyglot	55.1	45.3	70.7	59.0	59.6	57.3
Alignment
IFEval	82.3	83.9	87.4	89.8	83.2	88.7
Arena-Hard v2*	45.6	61.9	51.5	66.1	52.0	79.2
Creative Writing v3	81.6	84.9	83.8	88.1	80.4	87.5
WritingBench	74.5	75.5	79.2	86.2	77.0	85.2
Agent
BFCL-v3	64.7	66.5	60.1	65.2	68.0	70.9
TAU1-Retail	49.6	60.3#	81.4	70.7	65.2	71.3
TAU1-Airline	32.0	42.8#	59.6	53.5	32.0	44.0
TAU2-Retail	71.1	66.7#	75.5	70.6	64.9	74.6
TAU2-Airline	36.0	42.0#	55.5	56.5	36.0	50.0
TAU2-Telecom	34.0	29.8#	45.2	65.8	24.6	32.5
Multilingualism
MultiIF	66.5	70.4	–	76.2	70.2	77.5
MMLU-ProX	75.8	76.2	–	74.5	73.2	79.4
INCLUDE	80.1	82.1	–	76.9	75.6	79.5
PolyMATH	32.2	25.5	30.0	44.8	27.0	50.2

Qwen: Qwen3 235B A22B 2507

Conversations

Performance Benchmarks

Explore More AI Models