Train a 64M ChatBot from zero.
2 hours. ¥3. One 3090.
That's it.
💰 Ultra-Cheap
One 3090. Zero to ChatBot in 2 hours, costing just ¥3.
📦 Full Stack
Complete pipeline: Tokenizer → Pretrain → SFT → LoRA → DPO → PPO/GRPO/CISPO → Agentic RL
🔬 Latest RL
PPO, GRPO, CISPO + Agentic RL + YaRN length extrapolation. Native PyTorch.
📖 Learn by Reading
Pure PyTorch. No black boxes. Understand every line of code.
🔌 Plug & Play
Compatible with vLLM, ollama, llama.cpp, SGLang, transformers.
⚡ OpenAI API
Drop-in replacement for FastGPT, Open-WebUI, Dify. Tool Calling & Adaptive Thinking.
| Model |
Parameters |
Hidden Dim |
Layers |
Memory |
| MiniMind-3 |
64M |
768 |
8 |
~0.5 GB |
| MiniMind-3-MoE |
198M / A64M |
768 |
8 |
~1.0 GB |
- 🔥 Release minimind-3 / minimind-3-moe: structure, tokenizer, training & inference fully updated
- Architecture aligned with Qwen3 / Qwen3-MoE: Dense ~64M, MoE ~198M/A64M
- New native Agentic RL script (train_agent.py): multi-turn Tool-Use with GRPO/CISPO
- RLAIF / Agentic RL rollout engine decoupled for flexible inference backends
- serve_openai_api.py & web_demo.py: reasoning_content / tool_calls / open_thinking
- Tokenizer updated (BPE + ByteLevel) with tool call & thinking tokens
- LoRA weight merge & export via scripts/convert_model.py
- README & architecture diagrams major update
- 🔥 RLAIF algorithms: PPO, GRPO, SPO (native PyTorch)
- Checkpoint resume training: auto-save & cross-GPU recovery
- RLAIF dataset: rlaif-mini.jsonl; Simplified DPO dataset with Chinese data
- YaRN algorithm for RoPE length extrapolation
- Adaptive Thinking in reasoning models
- Tool Calling & Reasoning tags support
- SwanLab integration (WandB alternative for China)
- Model parameter renaming (align with Transformers)
- Generate method refactor (GenerationMixin)
- ✅ llama.cpp, vllm, ollama support
- Vocab update: <s></s> → <|im_start|><|im_end|>
- Code structure standardization
- Complete codebase rewrite
- MiniMind2 series: 26M, 104M, 145M models
- JSONL data format (no preprocessing needed)
- LoRA from scratch (no peft dependency)
- DPO native PyTorch implementation
- White-box model distillation
- DeepSeek-R1 distillation models
- HQ pretraining data (2h on single 3090)
- Vision multimodal support
- MiniMind-V project launch
- Pretrain dataset preprocessing update
- Text integrity preservation
- Code cleanup
- MiniMind-V1-MoE model release
- Standardized minimind_tokenizer
- Removed mistral_tokenizer variants
- MiniMind-V1 (108M) release
- 3 epochs pretrain + 10 epochs SFT
- ModelScope online demo
- 🚀 First open-source release
- MiniMind project launched
🎯 Try It Online
📦 Get the Code
💡 Why MiniMind?
- Train from zero, not just finetune
- Pure PyTorch—no magic, no black boxes
- Understand by building, not by reading docs
- Works on your laptop—no cloud GPUs needed
- RLAIF algorithms: PPO, GRPO, CISPO + Agentic RL
- Tool Calling & Adaptive Thinking built-in
- OpenAI API compatible—plug into any UI
- Vision support via MiniMind-V
💭 "Building a Lego plane beats flying first class."