Mistral: Mistral Small 3.2 24B

Mistral: Mistral Small 3.2 24B

Mistral-Small-3.2-24B

Mistral Small 3.2 24B Instruct 2506 is the latest 24B-parameter release from Mistral, designed to enhance instruction following, minimize repetition, and improve function calling. Building on the 3.1 version, this update delivers higher accuracy on WildBench and Arena Hard, reduces infinite generation issues, and strengthens performance in tool use and structured output tasks.

The model supports both image and text inputs, enabling structured outputs, advanced function/tool calling, and reliable multimodal interaction. It achieves strong results across coding (HumanEval+, MBPP), STEM benchmarks (MMLU, MATH, GPQA), and vision tasks (ChartQA, DocVQA), making it a versatile choice for a wide range of applications.

Conversations

Download TXT
Download PDF

Creator Mistral AI
Release Date June, 2025
License Apache 2.0
Context Window 131,072
Image Input Support Yes
Open Source (Weights) Yes
Parameters 24B
Model Weights Click here

Z.AI: GLM 4.5 Air

Z.AI: GLM 4.5 Air

GLM-4.5-Air

GLM 4.5 Air is the lightweight variant of the GLM-4.5 flagship family, purpose-built for agent-focused applications. It retains the Mixture-of-Experts (MoE) architecture but with a more compact parameter size for efficiency. Like its larger counterpart, it supports hybrid inference modes—offering a “thinking mode” for advanced reasoning and tool use, and a “non-thinking mode” for fast, real-time interactions. Users can easily control reasoning behavior through a simple boolean toggle.

Conversations

Download TXT
Download PDF

Creator zAI
Release Date July, 2025
License MIT
Context Window 131,072
Image Input Support No
Open Source (Weights) Yes
Model Weights Click here

Explore More AI Models

Qwen: Qwen3 30B A3B 2507

Qwen: Qwen3 30B A3B 2507

Qwen3-30B-A3B-2507

Qwen3 30B A3B 2507 Instruct is a 30.5B-parameter Mixture-of-Experts language model from the Qwen series, with 3.3B active parameters per inference. Operating in non-thinking mode, it is optimized for high-quality instruction following, multilingual comprehension, and agentic tool use. Trained further on instruction data, it delivers strong results across benchmarks in reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench). Compared to its base non-instruct variant, it performs significantly better on open-ended and subjective tasks while maintaining robust factual accuracy and coding capabilities.

Conversations

Download TXT
Download PDF

Creator Alibaba
Release Date July, 2025
License Apache 2.0
Context Window 262,144
Image Input Support No
Open Source (Weights) Yes
Parameters 30.5B, 3.3B active at inference time
Model Weights Click here

Performance

Deepseek-V3-0324 GPT-4o-0327 Gemini-2.5-Flash Non-Thinking Qwen3-235B-A22B Non-Thinking Qwen3-30B-A3B Non-Thinking Qwen3-30B-A3B-Instruct-2507
Knowledge
MMLU-Pro 81.2 79.8 81.1 75.2 69.1 78.4
MMLU-Redux 90.4 91.3 90.6 89.2 84.1 89.3
GPQA 68.4 66.9 78.3 62.9 54.8 70.4
SuperGPQA 57.3 51.0 54.6 48.2 42.2 53.4
Reasoning
AIME25 46.6 26.7 61.6 24.7 21.6 61.3
HMMT25 27.5 7.9 45.8 10.0 12.0 43.0
ZebraLogic 83.4 52.6 57.9 37.7 33.2 90.0
LiveBench 20241125 66.9 63.7 69.1 62.5 59.4 69.0
Coding
LiveCodeBench v6 (25.02-25.05) 45.2 35.8 40.1 32.9 29.0 43.2
MultiPL-E 82.2 82.7 77.7 79.3 74.6 83.8
Aider-Polyglot 55.1 45.3 44.0 59.6 24.4 35.6
Alignment
IFEval 82.3 83.9 84.3 83.2 83.7 84.7
Arena-Hard v2* 45.6 61.9 58.3 52.0 24.8 69.0
Creative Writing v3 81.6 84.9 84.6 80.4 68.1 86.0
WritingBench 74.5 75.5 80.5 77.0 72.2 85.5
Agent
BFCL-v3 64.7 66.5 66.1 68.0 58.6 65.1
TAU1-Retail 49.6 60.3# 65.2 65.2 38.3 59.1
TAU1-Airline 32.0 42.8# 48.0 32.0 18.0 40.0
TAU2-Retail 71.1 66.7# 64.3 64.9 31.6 57.0
TAU2-Airline 36.0 42.0# 42.5 36.0 18.0 38.0
TAU2-Telecom 34.0 29.8# 16.9 24.6 18.4 12.3
Multilingualism
MultiIF 66.5 70.4 69.4 70.2 70.8 67.9
MMLU-ProX 75.8 76.2 78.3 73.2 65.1 72.0
INCLUDE 80.1 82.1 83.8 75.6 67.8 71.9
PolyMATH 32.2 25.5 41.9 27.0 23.3 43.1

Google: Gemma 3n E4B

Google: Gemma 3n E4B

Gemma-3n-E4B

Gemma 3n E4B-it is a highly efficient AI model optimized for mobile and low-resource devices, including phones, laptops, and tablets. It supports multimodal inputs—text, images, and audio—enabling a wide range of tasks such as text generation, speech recognition, translation, and image analysis. Powered by innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n intelligently manages memory and computation by selectively activating parameters, greatly reducing runtime resource demands.

Trained across 140+ languages and equipped with a flexible 32K token context window, the model adapts its parameter usage based on the task or device, ensuring both efficiency and versatility. This makes Gemma 3n ideal for privacy-focused, offline-capable applications and on-device AI solutions.

Conversations

Download TXT
Download PDF

Creator Google
Release Date June, 2025
License Gemma License
Context Window 32,000
Image Input Support Yes
Open Source (Weights) Yes
Parameters 8.4B, 4.0B active at inference time
Model Weights Click here

Qwen: Qwen3 Coder 480B A35B

Qwen: Qwen3 Coder 480B A35B

Qwen3-Coder-480B-A35B

Qwen3 Coder 480B A35B Instruct is a Mixture-of-Experts (MoE) model from the Qwen team, designed specifically for advanced code generation. It excels at agentic coding tasks such as function calling, tool use, and long-context reasoning across large repositories. The model contains 480B total parameters, with 35B active per forward pass (8 of 160 experts). Alibaba’s endpoint pricing depends on context length, with higher rates applying once inputs exceed 128K tokens.

Conversations

Download TXT
Download PDF

Creator Alibaba
Release Date July, 2025
License Apache 2.0
Context Window 262,144
Image Input Support No
Open Source (Weights) Yes
Parameters 480B, 35.0B active at inference time
Model Weights Click here

OpenAI: GPT-5

OpenAI: GPT-5

OpenAI-GPT-5

GPT-5 is OpenAI’s most advanced AI model, designed with significant upgrades in reasoning, coding, and overall user experience. It excels at handling complex tasks that demand step-by-step logic, precise instruction following, and reliable accuracy in critical use cases. The model introduces support for test-time routing and enhanced prompt interpretation, allowing it to adapt to user intent—for example, when asked to “think carefully” about a problem. Key improvements include fewer hallucinations, reduced bias toward agreement, and stronger performance across coding, writing, and health-related applications.

Conversations

Download TXT
Download PDF

Creator OpenAI
Release Date August, 2025
License Proprietary
Context Window 400,000
Image Input Support Yes
Open Source (Weights) No
Input Cost $1.25/M tokens
Output Cost $10/M tokens

Google: Gemini 2.5 Flash

Google: Gemini 2.5 Flash

Gemini-2.5-Flash

Gemini 2.5 Flash is Google’s cutting-edge workhorse model, engineered for advanced reasoning, coding, mathematics, and scientific applications. It features built-in “thinking” capabilities that allow it to deliver more accurate responses and handle complex contexts with greater nuance.

Conversations

Download TXT
Download PDF

Creator Google
Release Date May, 2025
License Proprietary
Context Window 1,048,576
Image Input Support Yes
Open Source (Weights) No
Input Cost $0.30/M tokens
Output Cost $2.50/M tokens