Wan 2.6

By Alibaba — Open-source high quality multi-modal video generation

Audio Fixed Lens

Up to 15 Seconds

Generate 5s, 10s, or 15s clips at 24fps. Longer durations for extended scenes and storytelling.

Up to 1080p

Full HD output at 720p or 1080p resolution. Sharp, detailed frames suitable for professional use.

Multi-Modal

Text-to-video, image-to-video, and Reference-to-Video with character consistency and lip sync.

About the Model

What Is Wan 2.6?

Wan 2.6 is a state-of-the-art open-source video generation model created by Alibaba's Qwen-Wan research team, released in December 2025. Built on a 14-billion parameter Diffusion Transformer (DiT) architecture with Mixture-of-Experts (MoE), it activates only around 20% of its parameters during each generation step, delivering high quality output with remarkable computational efficiency.

The model was trained on an unprecedented dataset of 1.5 billion videos and 10 billion images. It employs a 3D VAE with 4x16x16 compression for efficient video encoding and a T5-XXL text encoder for deep prompt understanding in both English and Chinese. This combination enables Wan 2.6 to generate videos up to 1080p at 24fps with strong temporal coherence and detailed visual fidelity.

What sets Wan 2.6 apart is its fully open-source nature — both model weights and training code are publicly available. This means researchers, independent creators, and businesses can run it locally on consumer GPUs with as little as 12GB VRAM, making it one of the most accessible high-quality video generation models available today. Its Reference-to-Video (R2V) capability and native phoneme-level lip sync further push the boundaries of what open-source models can achieve.

Capabilities

Key Features

Wan 2.6 combines cutting-edge architecture with practical features for real-world video creation.

Fully Open Source

Model weights and code are publicly released. Run locally on consumer GPUs with 12GB VRAM for complete data privacy and control.

MoE Efficiency

14B parameter DiT with Mixture-of-Experts activates only ~20% of parameters per step, achieving top-tier quality with lower compute requirements.

Reference-to-Video

Upload a character reference to maintain consistent appearance and voice across generated clips. Perfect for multi-shot storytelling.

Native Lip Sync

Phoneme-level lip synchronization built directly into the model. Characters speak with accurately matched mouth movements in Chinese and English.

Bilingual Prompts

Deep understanding of both Chinese and English prompts via T5-XXL text encoder, trained on massive bilingual datasets for nuanced interpretation.

3D VAE Compression

Advanced 3D VAE with 4x16x16 spatiotemporal compression ensures temporal coherence and high visual fidelity across every frame.

Specifications

Technical Specs

A detailed look at the architecture and capabilities powering Wan 2.6.

Specification	Details
Developer	Alibaba Qwen-Wan Team
Release Date	December 2025
Architecture	14B parameter Diffusion Transformer (DiT) with Mixture-of-Experts
Active Parameters	~20% active during generation (MoE routing)
Video Encoder	3D VAE with 4x16x16 spatiotemporal compression
Text Encoder	T5-XXL (bilingual Chinese/English)
Training Data	1.5 billion videos + 10 billion images
Max Resolution	1080p (1920x1080) at 24fps
Max Duration	Up to 15 seconds per generation
Minimum VRAM	12GB (consumer GPU compatible)
License	Open source (weights + code publicly available)
Generation Modes	Text-to-Video, Image-to-Video, Reference-to-Video (R2V)

Applications

Use Cases

From independent creators to enterprise teams, Wan 2.6 powers a wide range of video workflows.

Independent Filmmaking

Create short films, music videos, and narrative content with consistent characters using Reference-to-Video and multi-shot storytelling capabilities.

Research & Academia

Open-source weights enable full reproducibility and fine-tuning. Ideal for video generation research, ablation studies, and academic benchmarking.

Local & Private Deployment

Run entirely on your own infrastructure for sensitive content. No data leaves your servers, making it perfect for healthcare, legal, and enterprise use cases.

Chinese-Market Content

Superior Chinese prompt comprehension makes Wan 2.6 the top choice for Douyin, WeChat, Bilibili, and other Chinese platform video content creation.

Educational Content

Generate explainer videos, animated tutorials, and educational materials with lip-synced narration in both Chinese and English languages.

Social Media & Marketing

Rapidly prototype video ads, social media content, and product demos. Generate multiple variations from text prompts to find the perfect creative direction.

FAQ

Frequently Asked Questions

Get Started

Start Creating with Wan 2.6

Generate high-quality AI videos with the most accessible open-source model available. No setup required — just type your prompt and go.

Generate Video Now