Visitor Badge GitHub Stars License Last Commit

Train a 67M Vision-Language Model from zero.
1 hour. ¥1.3. One 3090.
See the World.

67M

Parameters

1h

Training

¥1.3

Cost

1/2600

vs GPT-3

✨ What You Get

📸 Vision Multimodal

Full vision-language capabilities on personal GPUs.

📦 Complete VLM Stack

SigLIP2 encoder → LLM → Projector → Pretrain → SFT

💰 Ultra-Affordable

Single 3090 GPU, minimal compute resources needed.

📖 Pure PyTorch

Transparent implementation. Learn from reading the code.

🚀 Production Ready

OpenAI API compatible, ready for real-world deployment.

🔌 Extensible

Multi-image support, fine-tuning, and more capabilities.

📦 Models
Model Parameters Inference Memory Release
minimind-3v-moe 202M-A67M 1.0 GB 2026.04.01
minimind-3v 67M 0.5 GB 2026.04.01
MiniMind2-V 104M 1.1 GB 2025.02.20
MiniMind2-Small-V 26M 0.6 GB 2025.02.20
📰 What's New
🎉 2026-04-01 (Latest)
  • Added minimind-3v (67M) and minimind-3v-moe (201M-A67M) models
  • Unified 768+8 architecture, supporting both dense and moe modes
  • Switched Visual Encoder from CLIP to SigLIP2 (siglip2-base-p16-ve)
  • Replaced QFormer with MLP Projection + reshape compression
  • Dataset format updated to parquet, mixed data sources, updated tokenizer with image placeholder <|image_pad|>, new WebUI with dynamic model directory scanning and dropdown model switching
  • Model code refactored, LLM/VLM unified for Transformers format
  • Training scripts support DDP multi-GPU, bfloat16 mixed precision, torch.compile acceleration
🛠️ 2025-10-24 +
  • Bug fix: mismatched model weights
  • Adapted to the latest minimind updates
  • Refactored training and evaluation scripts
  • Added complete checkpoint resume support
🔥 2025-04-27 +
  • Compatibility updates
  • Adapted to new features in the minimind repository
  • Standardized parts of the code
🔥 2025-02-20 +
  • MiniMind2-V updated alongside MiniMind2
  • Significant reduction of redundant code
  • Major simplification of model structure
  • Updated dataset format with new SFT datasets
  • Better performance than previous VLM!
🎬 2024-10-05 (First Release) +
  • MiniMind-V released on schedule
  • First open-source vision-language model release
🎮 Inside MiniMind-V
MiniMind-V Demo VLM Structure VLM Structure MOE

💡 Why MiniMind-V?

💭 "Seeing the world through a smaller lens, yet just as sharp."