Month: January 2026
Mistral: Mistral Small 3.2 24B
Mistral: Mistral Small 3.2 24B

Mistral Small 3.2 24B Instruct 2506 is the latest 24B-parameter release from Mistral, designed to enhance instruction following, minimize repetition, and improve function calling. Building on the 3.1 version, this update delivers higher accuracy on WildBench and Arena Hard, reduces infinite generation issues, and strengthens performance in tool use and structured output tasks.
The model supports both image and text inputs, enabling structured outputs, advanced function/tool calling, and reliable multimodal interaction. It achieves strong results across coding (HumanEval+, MBPP), STEM benchmarks (MMLU, MATH, GPQA), and vision tasks (ChartQA, DocVQA), making it a versatile choice for a wide range of applications.
| Creator | Mistral AI |
| Release Date | June, 2025 |
| License | Apache 2.0 |
| Context Window | 131,072 |
| Image Input Support | Yes |
| Open Source (Weights) | Yes |
| Parameters | 24B |
| Model Weights | Click here |
Explore More AI Models
Z.AI: GLM 4.5 Air
Z.AI: GLM 4.5 Air

GLM 4.5 Air is the lightweight variant of the GLM-4.5 flagship family, purpose-built for agent-focused applications. It retains the Mixture-of-Experts (MoE) architecture but with a more compact parameter size for efficiency. Like its larger counterpart, it supports hybrid inference modes—offering a “thinking mode” for advanced reasoning and tool use, and a “non-thinking mode” for fast, real-time interactions. Users can easily control reasoning behavior through a simple boolean toggle.
| Creator | zAI |
| Release Date | July, 2025 |
| License | MIT |
| Context Window | 131,072 |
| Image Input Support | No |
| Open Source (Weights) | Yes |
| Model Weights | Click here |
Explore More AI Models
Qwen: Qwen3 30B A3B 2507
Qwen: Qwen3 30B A3B 2507

Qwen3 30B A3B 2507 Instruct is a 30.5B-parameter Mixture-of-Experts language model from the Qwen series, with 3.3B active parameters per inference. Operating in non-thinking mode, it is optimized for high-quality instruction following, multilingual comprehension, and agentic tool use. Trained further on instruction data, it delivers strong results across benchmarks in reasoning (AIME, ZebraLogic), coding (MultiPL-E, LiveCodeBench), and alignment (IFEval, WritingBench). Compared to its base non-instruct variant, it performs significantly better on open-ended and subjective tasks while maintaining robust factual accuracy and coding capabilities.
| Creator | Alibaba |
| Release Date | July, 2025 |
| License | Apache 2.0 |
| Context Window | 262,144 |
| Image Input Support | No |
| Open Source (Weights) | Yes |
| Parameters | 30.5B, 3.3B active at inference time |
| Model Weights | Click here |
Performance
| Deepseek-V3-0324 | GPT-4o-0327 | Gemini-2.5-Flash Non-Thinking | Qwen3-235B-A22B Non-Thinking | Qwen3-30B-A3B Non-Thinking | Qwen3-30B-A3B-Instruct-2507 | |
|---|---|---|---|---|---|---|
| Knowledge | ||||||
| MMLU-Pro | 81.2 | 79.8 | 81.1 | 75.2 | 69.1 | 78.4 |
| MMLU-Redux | 90.4 | 91.3 | 90.6 | 89.2 | 84.1 | 89.3 |
| GPQA | 68.4 | 66.9 | 78.3 | 62.9 | 54.8 | 70.4 |
| SuperGPQA | 57.3 | 51.0 | 54.6 | 48.2 | 42.2 | 53.4 |
| Reasoning | ||||||
| AIME25 | 46.6 | 26.7 | 61.6 | 24.7 | 21.6 | 61.3 |
| HMMT25 | 27.5 | 7.9 | 45.8 | 10.0 | 12.0 | 43.0 |
| ZebraLogic | 83.4 | 52.6 | 57.9 | 37.7 | 33.2 | 90.0 |
| LiveBench 20241125 | 66.9 | 63.7 | 69.1 | 62.5 | 59.4 | 69.0 |
| Coding | ||||||
| LiveCodeBench v6 (25.02-25.05) | 45.2 | 35.8 | 40.1 | 32.9 | 29.0 | 43.2 |
| MultiPL-E | 82.2 | 82.7 | 77.7 | 79.3 | 74.6 | 83.8 |
| Aider-Polyglot | 55.1 | 45.3 | 44.0 | 59.6 | 24.4 | 35.6 |
| Alignment | ||||||
| IFEval | 82.3 | 83.9 | 84.3 | 83.2 | 83.7 | 84.7 |
| Arena-Hard v2* | 45.6 | 61.9 | 58.3 | 52.0 | 24.8 | 69.0 |
| Creative Writing v3 | 81.6 | 84.9 | 84.6 | 80.4 | 68.1 | 86.0 |
| WritingBench | 74.5 | 75.5 | 80.5 | 77.0 | 72.2 | 85.5 |
| Agent | ||||||
| BFCL-v3 | 64.7 | 66.5 | 66.1 | 68.0 | 58.6 | 65.1 |
| TAU1-Retail | 49.6 | 60.3# | 65.2 | 65.2 | 38.3 | 59.1 |
| TAU1-Airline | 32.0 | 42.8# | 48.0 | 32.0 | 18.0 | 40.0 |
| TAU2-Retail | 71.1 | 66.7# | 64.3 | 64.9 | 31.6 | 57.0 |
| TAU2-Airline | 36.0 | 42.0# | 42.5 | 36.0 | 18.0 | 38.0 |
| TAU2-Telecom | 34.0 | 29.8# | 16.9 | 24.6 | 18.4 | 12.3 |
| Multilingualism | ||||||
| MultiIF | 66.5 | 70.4 | 69.4 | 70.2 | 70.8 | 67.9 |
| MMLU-ProX | 75.8 | 76.2 | 78.3 | 73.2 | 65.1 | 72.0 |
| INCLUDE | 80.1 | 82.1 | 83.8 | 75.6 | 67.8 | 71.9 |
| PolyMATH | 32.2 | 25.5 | 41.9 | 27.0 | 23.3 | 43.1 |
Explore More AI Models
Google: Gemma 3n E4B
Google: Gemma 3n E4B

Gemma 3n E4B-it is a highly efficient AI model optimized for mobile and low-resource devices, including phones, laptops, and tablets. It supports multimodal inputs—text, images, and audio—enabling a wide range of tasks such as text generation, speech recognition, translation, and image analysis. Powered by innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n intelligently manages memory and computation by selectively activating parameters, greatly reducing runtime resource demands.
Trained across 140+ languages and equipped with a flexible 32K token context window, the model adapts its parameter usage based on the task or device, ensuring both efficiency and versatility. This makes Gemma 3n ideal for privacy-focused, offline-capable applications and on-device AI solutions.
| Creator | |
| Release Date | June, 2025 |
| License | Gemma License |
| Context Window | 32,000 |
| Image Input Support | Yes |
| Open Source (Weights) | Yes |
| Parameters | 8.4B, 4.0B active at inference time |
| Model Weights | Click here |
Explore More AI Models
Qwen: Qwen3 Coder 480B A35B
Qwen: Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B Instruct is a Mixture-of-Experts (MoE) model from the Qwen team, designed specifically for advanced code generation. It excels at agentic coding tasks such as function calling, tool use, and long-context reasoning across large repositories. The model contains 480B total parameters, with 35B active per forward pass (8 of 160 experts). Alibaba’s endpoint pricing depends on context length, with higher rates applying once inputs exceed 128K tokens.
| Creator | Alibaba |
| Release Date | July, 2025 |
| License | Apache 2.0 |
| Context Window | 262,144 |
| Image Input Support | No |
| Open Source (Weights) | Yes |
| Parameters | 480B, 35.0B active at inference time |
| Model Weights | Click here |
Explore More AI Models
OpenAI: GPT-5
OpenAI: GPT-5

GPT-5 is OpenAI’s most advanced AI model, designed with significant upgrades in reasoning, coding, and overall user experience. It excels at handling complex tasks that demand step-by-step logic, precise instruction following, and reliable accuracy in critical use cases. The model introduces support for test-time routing and enhanced prompt interpretation, allowing it to adapt to user intent—for example, when asked to “think carefully” about a problem. Key improvements include fewer hallucinations, reduced bias toward agreement, and stronger performance across coding, writing, and health-related applications.
| Creator | OpenAI |
| Release Date | August, 2025 |
| License | Proprietary |
| Context Window | 400,000 |
| Image Input Support | Yes |
| Open Source (Weights) | No |
| Input Cost | $1.25/M tokens |
| Output Cost | $10/M tokens |
Explore More AI Models
OpenAI – ChatSonic
Google: Gemini 2.5 Flash
Google: Gemini 2.5 Flash

Gemini 2.5 Flash is Google’s cutting-edge workhorse model, engineered for advanced reasoning, coding, mathematics, and scientific applications. It features built-in “thinking” capabilities that allow it to deliver more accurate responses and handle complex contexts with greater nuance.
| Creator | |
| Release Date | May, 2025 |
| License | Proprietary |
| Context Window | 1,048,576 |
| Image Input Support | Yes |
| Open Source (Weights) | No |
| Input Cost | $0.30/M tokens |
| Output Cost | $2.50/M tokens |







