Anthropic: Claude Opus 4.1| ChatSonic

Anthropic: Claude Opus 4.1

Claude Opus 4.1 is the latest upgrade to Anthropic’s flagship model, delivering stronger performance in coding, reasoning, and agentic workflows. It reaches 74.5% on SWE-bench Verified and introduces major improvements in multi-file code refactoring, debugging accuracy, and fine-grained reasoning.

With extended thinking support up to 64K tokens, Opus 4.1 is well-suited for research, data analysis, and complex tool-assisted reasoning tasks, making it a powerful choice for advanced development and problem-solving.

Conversations

Download TXT

Download PDF

Creator	Anthropic
Release Date	August, 2025
License	Proprietary
Context Window	200,000
Image Input Support	Yes
Open Source (Weights)	No
Input Cost	$3/M tokens
Output Cost	$15/M tokens

Explore More AI Models

Anthropic: Claude Opus 4.1

Anthropic: Claude Sonnet 4

MiniMax – ChatSonic

MiniMax: MiniMax M1

MiniMax: MiniMax 01

Google – ChatSonic

Google: Gemini 2.5 Pro

Google: Gemma 3n E4B

Google: Gemini 2.5 Flash

Qwen: Qwen3 30B A3B 2507

Qwen3 30B A3B 2507 Instruct is a 30.5B-parameter Mixture-of-Experts language model from the Qwen series, with 3.3B active parameters per inference. Operating in non-thinking mode, it is optimized for high-quality instruction following, multilingual comprehension, and agentic tool use. Trained further on instruction data, it delivers strong results across benchmarks in reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench). Compared to its base non-instruct variant, it performs significantly better on open-ended and subjective tasks while maintaining robust factual accuracy and coding capabilities.

Conversations

Download TXT

Download PDF

Creator	Alibaba
Release Date	July, 2025
License	Apache 2.0
Context Window	262,144
Image Input Support	No
Open Source (Weights)	Yes
Parameters	30.5B, 3.3B active at inference time
Model Weights	Click here

Performance

	Deepseek-V3-0324	GPT-4o-0327	Gemini-2.5-Flash Non-Thinking	Qwen3-235B-A22B Non-Thinking	Qwen3-30B-A3B Non-Thinking	Qwen3-30B-A3B-Instruct-2507
Knowledge
MMLU-Pro	81.2	79.8	81.1	75.2	69.1	78.4
MMLU-Redux	90.4	91.3	90.6	89.2	84.1	89.3
GPQA	68.4	66.9	78.3	62.9	54.8	70.4
SuperGPQA	57.3	51.0	54.6	48.2	42.2	53.4
Reasoning
AIME25	46.6	26.7	61.6	24.7	21.6	61.3
HMMT25	27.5	7.9	45.8	10.0	12.0	43.0
ZebraLogic	83.4	52.6	57.9	37.7	33.2	90.0
LiveBench 20241125	66.9	63.7	69.1	62.5	59.4	69.0
Coding
LiveCodeBench v6 (25.02-25.05)	45.2	35.8	40.1	32.9	29.0	43.2
MultiPL-E	82.2	82.7	77.7	79.3	74.6	83.8
Aider-Polyglot	55.1	45.3	44.0	59.6	24.4	35.6
Alignment
IFEval	82.3	83.9	84.3	83.2	83.7	84.7
Arena-Hard v2*	45.6	61.9	58.3	52.0	24.8	69.0
Creative Writing v3	81.6	84.9	84.6	80.4	68.1	86.0
WritingBench	74.5	75.5	80.5	77.0	72.2	85.5
Agent
BFCL-v3	64.7	66.5	66.1	68.0	58.6	65.1
TAU1-Retail	49.6	60.3#	65.2	65.2	38.3	59.1
TAU1-Airline	32.0	42.8#	48.0	32.0	18.0	40.0
TAU2-Retail	71.1	66.7#	64.3	64.9	31.6	57.0
TAU2-Airline	36.0	42.0#	42.5	36.0	18.0	38.0
TAU2-Telecom	34.0	29.8#	16.9	24.6	18.4	12.3
Multilingualism
MultiIF	66.5	70.4	69.4	70.2	70.8	67.9
MMLU-ProX	75.8	76.2	78.3	73.2	65.1	72.0
INCLUDE	80.1	82.1	83.8	75.6	67.8	71.9
PolyMATH	32.2	25.5	41.9	27.0	23.3	43.1

Explore More AI Models

Qwen: Qwen3 235B A22B 2507

Qwen: Qwen3 Coder 480B A35B

Qwen: Qwen3 30B A3B 2507

Qwen: Qwen3 235B A22B 2507

Qwen3 235B A22B 2507 Instruct is a multilingual, instruction-tuned Mixture-of-Experts model built on the Qwen3-235B architecture, activating 22B parameters per forward pass. It is optimized for versatile text generation tasks, including instruction following, logical reasoning, mathematics, coding, and tool use. The model supports a native 262K context window but does not include “thinking mode” (<think> blocks).

Compared to its base variant, this version offers substantial improvements in knowledge coverage, long-context reasoning, coding benchmarks, and open-ended alignment. It demonstrates particularly strong performance in multilingual understanding, mathematical reasoning (AIME, HMMT), and evaluation benchmarks such as Arena-Hard and WritingBench.

Conversations

Download TXT

Download PDF

Creator	Alibaba
Release Date	July, 2025
License	Apache 2.0
Context Window	262,144
Image Input Support	No
Open Source (Weights)	Yes
Parameters	235B, 22.0B active at inference time
Model Weights	Click here

Performance Benchmarks

	Deepseek-V3-0324	GPT-4o-0327	Claude Opus 4 Non-thinking	Kimi K2	Qwen3-235B-A22B Non-thinking	Qwen3-235B-A22B-Instruct-2507
Knowledge
MMLU-Pro	81.2	79.8	86.6	81.1	75.2	83.0
MMLU-Redux	90.4	91.3	94.2	92.7	89.2	93.1
GPQA	68.4	66.9	74.9	75.1	62.9	77.5
SuperGPQA	57.3	51.0	56.5	57.2	48.2	62.6
SimpleQA	27.2	40.3	22.8	31.0	12.2	54.3
CSimpleQA	71.1	60.2	68.0	74.5	60.8	84.3
Reasoning
AIME25	46.6	26.7	33.9	49.5	24.7	70.3
HMMT25	27.5	7.9	15.9	38.8	10.0	55.4
ARC-AGI	9.0	8.8	30.3	13.3	4.3	41.8
ZebraLogic	83.4	52.6	–	89.0	37.7	95.0
LiveBench 20241125	66.9	63.7	74.6	76.4	62.5	75.4
Coding
LiveCodeBench v6 (25.02-25.05)	45.2	35.8	44.6	48.9	32.9	51.8
MultiPL-E	82.2	82.7	88.5	85.7	79.3	87.9
Aider-Polyglot	55.1	45.3	70.7	59.0	59.6	57.3
Alignment
IFEval	82.3	83.9	87.4	89.8	83.2	88.7
Arena-Hard v2*	45.6	61.9	51.5	66.1	52.0	79.2
Creative Writing v3	81.6	84.9	83.8	88.1	80.4	87.5
WritingBench	74.5	75.5	79.2	86.2	77.0	85.2
Agent
BFCL-v3	64.7	66.5	60.1	65.2	68.0	70.9
TAU1-Retail	49.6	60.3#	81.4	70.7	65.2	71.3
TAU1-Airline	32.0	42.8#	59.6	53.5	32.0	44.0
TAU2-Retail	71.1	66.7#	75.5	70.6	64.9	74.6
TAU2-Airline	36.0	42.0#	55.5	56.5	36.0	50.0
TAU2-Telecom	34.0	29.8#	45.2	65.8	24.6	32.5
Multilingualism
MultiIF	66.5	70.4	–	76.2	70.2	77.5
MMLU-ProX	75.8	76.2	–	74.5	73.2	79.4
INCLUDE	80.1	82.1	–	76.9	75.6	79.5
PolyMATH	32.2	25.5	30.0	44.8	27.0	50.2

Explore More AI Models

Qwen: Qwen3 235B A22B 2507

Qwen: Qwen3 Coder 480B A35B

Qwen: Qwen3 30B A3B 2507

xAI: Grok 4

Grok 4 is xAI’s latest reasoning model, featuring a 256K context window with support for parallel tool calling, structured outputs, and multimodal inputs (text and images). Unlike some models, its reasoning process is not exposed, cannot be disabled, and does not allow users to specify reasoning depth. Pricing tiers adjust once a request exceeds 128K total tokens.

Conversations

Download TXT

Download PDF

Creator	xAI
Release Date	July, 2025
License	Proprietary
Context Window	256,000
Image Input Support	Yes
Open Source (Weights)	No
Input Cost	$3/M tokens
Output Cost	$15/M tokens

Explore More AI Models

xAI: Grok 4

xAI: Grok Code Fast 1

OpenAI: gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) model from OpenAI, built for advanced reasoning, agentic behavior, and versatile production use cases. It activates 5.1B parameters per forward pass and is optimized to run efficiently on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool capabilities such as function calling, web browsing, and structured output generation.

Conversations

Download TXT

Download PDF

Creator	OpenAI
Release Date	August, 2025
License	Apache 2.0
Context Window	131,000
Image Input Support	No
Open Source (Weights)	Yes
Parameters	117B, 5.1B active at inference time
Model Weights	Click here

Explore More AI Models

OpenAI: GPT-5

OpenAI: gpt-oss-120b

OpenAI: GPT-4.1

Z.AI: GLM 4.5 Air

GLM 4.5 Air is the lightweight variant of the GLM-4.5 flagship family, purpose-built for agent-focused applications. It retains the Mixture-of-Experts (MoE) architecture but with a more compact parameter size for efficiency. Like its larger counterpart, it supports hybrid inference modes—offering a “thinking mode” for advanced reasoning and tool use, and a “non-thinking mode” for fast, real-time interactions. Users can easily control reasoning behavior through a simple boolean toggle.

Conversations

Download TXT

Download PDF

Creator	zAI
Release Date	July, 2025
License	MIT
Context Window	131,072
Image Input Support	No
Open Source (Weights)	Yes
Model Weights	Click here

Explore More AI Models

Z.AI: GLM 4.5 Air

Z.AI: GLM 4.5

Anthropic: Claude Sonnet 4

Claude Sonnet 4 builds on the strengths of Sonnet 3.7, delivering major improvements in coding and reasoning with greater precision and controllability. It achieves state-of-the-art results on SWE-bench (72.7%), striking an effective balance between advanced capability and computational efficiency.

Key upgrades include better autonomous codebase navigation, lower error rates in agent-driven workflows, and stronger reliability in handling complex instructions. Optimized for real-world use, Sonnet 4 offers advanced reasoning power while remaining efficient and responsive across a wide range of coding, software development, and general-purpose tasks.

Conversations

Download TXT

Download PDF

Creator	Anthropic
Release Date	May, 2025
License	Proprietary
Context Window	1,000,000
Image Input Support	Yes
Open Source (Weights)	No
Input Cost	$15/M tokens
Output Cost	$75/M tokens