Wan 2.6
By Alibaba — Open-source high quality multi-modal video generation
Up to 15 Seconds
Generate 5s, 10s, or 15s clips at 24fps. Longer durations for extended scenes and storytelling.
Up to 1080p
Full HD output at 720p or 1080p resolution. Sharp, detailed frames suitable for professional use.
Multi-Modal
Text-to-video, image-to-video, and Reference-to-Video with character consistency and lip sync.
What Is Wan 2.6?
Wan 2.6 is a state-of-the-art open-source video generation model created by Alibaba's Qwen-Wan research team, released in December 2025. Built on a 14-billion parameter Diffusion Transformer (DiT) architecture with Mixture-of-Experts (MoE), it activates only around 20% of its parameters during each generation step, delivering high quality output with remarkable computational efficiency.
The model was trained on an unprecedented dataset of 1.5 billion videos and 10 billion images. It employs a 3D VAE with 4x16x16 compression for efficient video encoding and a T5-XXL text encoder for deep prompt understanding in both English and Chinese. This combination enables Wan 2.6 to generate videos up to 1080p at 24fps with strong temporal coherence and detailed visual fidelity.
What sets Wan 2.6 apart is its fully open-source nature — both model weights and training code are publicly available. This means researchers, independent creators, and businesses can run it locally on consumer GPUs with as little as 12GB VRAM, making it one of the most accessible high-quality video generation models available today. Its Reference-to-Video (R2V) capability and native phoneme-level lip sync further push the boundaries of what open-source models can achieve.
Key Features
Wan 2.6 combines cutting-edge architecture with practical features for real-world video creation.
Fully Open Source
Model weights and code are publicly released. Run locally on consumer GPUs with 12GB VRAM for complete data privacy and control.
MoE Efficiency
14B parameter DiT with Mixture-of-Experts activates only ~20% of parameters per step, achieving top-tier quality with lower compute requirements.
Reference-to-Video
Upload a character reference to maintain consistent appearance and voice across generated clips. Perfect for multi-shot storytelling.
Native Lip Sync
Phoneme-level lip synchronization built directly into the model. Characters speak with accurately matched mouth movements in Chinese and English.
Bilingual Prompts
Deep understanding of both Chinese and English prompts via T5-XXL text encoder, trained on massive bilingual datasets for nuanced interpretation.
3D VAE Compression
Advanced 3D VAE with 4x16x16 spatiotemporal compression ensures temporal coherence and high visual fidelity across every frame.
Technical Specs
A detailed look at the architecture and capabilities powering Wan 2.6.
| Specification | Details |
|---|---|
| Developer | Alibaba Qwen-Wan Team |
| Release Date | December 2025 |
| Architecture | 14B parameter Diffusion Transformer (DiT) with Mixture-of-Experts |
| Active Parameters | ~20% active during generation (MoE routing) |
| Video Encoder | 3D VAE with 4x16x16 spatiotemporal compression |
| Text Encoder | T5-XXL (bilingual Chinese/English) |
| Training Data | 1.5 billion videos + 10 billion images |
| Max Resolution | 1080p (1920x1080) at 24fps |
| Max Duration | Up to 15 seconds per generation |
| Minimum VRAM | 12GB (consumer GPU compatible) |
| License | Open source (weights + code publicly available) |
| Generation Modes | Text-to-Video, Image-to-Video, Reference-to-Video (R2V) |
Use Cases
From independent creators to enterprise teams, Wan 2.6 powers a wide range of video workflows.
Independent Filmmaking
Create short films, music videos, and narrative content with consistent characters using Reference-to-Video and multi-shot storytelling capabilities.
Research & Academia
Open-source weights enable full reproducibility and fine-tuning. Ideal for video generation research, ablation studies, and academic benchmarking.
Local & Private Deployment
Run entirely on your own infrastructure for sensitive content. No data leaves your servers, making it perfect for healthcare, legal, and enterprise use cases.
Chinese-Market Content
Superior Chinese prompt comprehension makes Wan 2.6 the top choice for Douyin, WeChat, Bilibili, and other Chinese platform video content creation.
Educational Content
Generate explainer videos, animated tutorials, and educational materials with lip-synced narration in both Chinese and English languages.
Social Media & Marketing
Rapidly prototype video ads, social media content, and product demos. Generate multiple variations from text prompts to find the perfect creative direction.
Frequently Asked Questions
Start Creating with Wan 2.6
Generate high-quality AI videos with the most accessible open-source model available. No setup required — just type your prompt and go.
Generate Video Now