MiniMax-01 integrates MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding, combining multimodal strengths in a single model. It features 456B parameters, with 45.9B active per inference, and supports context lengths of up to 4 million tokens.
The text component uses a hybrid architecture that blends Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The vision component follows a “ViT-MLP-LLM” framework, trained on top of the text model to enable advanced multimodal reasoning.
Creator
MiniMax
Release Date
January, 2025
License
MiniMax Model License Agreement
Context Window
1,000,192
Image Input Support
No
Open Source (Weights)
Yes
Parameters
456B, 45.9B active at inference time
Model Weights
Click here
Performance Benchmarks
Core Academic Benchmarks
Tasks
GPT-4o (11-20)
Claude-3.5-Sonnet (10-22)
Gemini-1.5-Pro (002)
Gemini-2.0-Flash (exp)
Qwen2.5-72B-Inst.
DeepSeek-V3
Llama-3.1-405B-Inst.
MiniMax-Text-01
General
MMLU*
85.7
88.3
86.8
86.5
86.1
88.5
88.6
88.5
MMLU-Pro*
74.4
78.0
75.8
76.4
71.1
75.9
73.3
75.7
SimpleQA
39.0
28.1
23.4
26.6
10.3
24.9
23.2
23.7
C-SimpleQA
64.6
56.8
59.4
63.3
52.2
64.8
54.7
67.4
IFEval (avg)
84.1
90.1
89.4
88.4
87.2
87.3
86.4
89.1
Arena-Hard
92.4
87.6
85.3
72.7
81.2
91.4
63.5
89.1
Reasoning
GPQA* (diamond)
46.0
65.0
59.1
62.1
49.0
59.1
50.7
54.4
DROP* (F1)
89.2
88.8
89.2
89.3
85.0
91.0
92.5
87.8
Mathematics
GSM8k*
95.6
96.9
95.2
95.4
95.8
96.7
96.7
94.8
MATH*
76.6
74.1
84.6
83.9
81.8
84.6
73.8
77.4
Coding
MBPP +
76.2
75.1
75.4
75.9
77.0
78.8
73.0
71.7
HumanEval
90.2
93.7
86.6
89.6
86.6
92.1
89.0
86.9
Ruler
Model
4k
8k
16k
32k
64k
128k
256k
512k
1M
GPT-4o (11-20)
0.970
0.921
0.890
0.888
0.884
–
–
–
–
Claude-3.5-Sonnet (10-22)
0.965
0.960
0.957
0.950
0.952
0.938
–
–
–
Gemini-1.5-Pro (002)
0.962
0.960
0.960
0.958
0.938
0.917
0.916
0.861
0.850
Gemini-2.0-Flash (exp)
0.960
0.960
0.951
0.957
0.937
0.860
0.797
0.709
–
MiniMax-Text-01
0.963
0.961
0.953
0.954
0.943
0.947
0.945
0.928
0.910
LongBench V2
Model
overall
easy
hard
short
medium
long
Human
53.7
100.0
25.1
47.2
59.1
53.7
w/ CoT
GPT-4o (11-20)
51.4
54.2
49.7
59.6
48.6
43.5
Claude-3.5-Sonnet (10-22)
46.7
55.2
41.5
53.9
41.9
44.4
Deepseek-V3
–
–
–
–
–
–
Qwen2.5-72B-Inst.
43.5
47.9
40.8
48.9
40.9
39.8
MiniMax-Text-01
56.5
66.1
50.5
61.7
56.7
47.2
w/o CoT
GPT-4o (11-20)
50.1
57.4
45.6
53.3
52.4
40.2
Claude-3.5-Sonnet (10-22)
41.0
46.9
37.3
46.1
38.6
37.0
Deepseek-V3
48.7
–
–
–
–
–
Qwen2.5-72B-Inst.
42.1
42.7
41.8
45.6
38.1
44.4
MiniMax-Text-01
52.9
60.9
47.9
58.9
52.6
43.5
MTOB
Context Type
no context
half book
full book
Δ half book
Δ full book
eng → kalam (ChrF)
GPT-4o (11-20)
9.90
54.30
–
44.40
–
Claude-3.5-Sonnet (10-22)
20.22
53.62
55.65
33.39
35.42
Gemini-1.5-Pro (002)
16.79
53.68
57.90
36.89
41.11
Gemini-2.0-Flash (exp)
12.20
49.50
53.30
37.30
41.10
Qwen-Long
16.55
48.48
45.94
31.92
29.39
MiniMax-Text-01
6.0
51.74
51.60
45.7
45.6
kalam → eng (BLEURT)
GPT-4o (11-20)
33.20
58.30
–
25.10
–
Claude-3.5-Sonnet (10-22)
31.42
59.70
62.30
28.28
30.88
Gemini-1.5-Pro (002)
32.02
61.52
63.09
29.50
31.07
Gemini-2.0-Flash (exp)
33.80
57.50
57.00
23.70
23.20
Qwen-Long
30.13
53.14
32.15
23.01
2.02
MiniMax-Text-01
33.65
57.10
58.00
23.45
24.35