Wan 3.1

Performance Optimization and Technological Advancements Based on Wan 2.1

Wan 3.1 is the next-generation Wan AI video generation model built upon Wan 2.1. It inherits the powerful capabilities of Wan 2.1 and has made significant performance optimizations and technological improvements in multiple aspects, further enhancing the efficiency, quality, and diversity of video generation.

Key Technical Improvements

1Optimization of Spatiotemporal VAE

Wan 3.1 has made in-depth optimizations to the spatiotemporal variational autoencoder. By improving the 3D causal structure, Wan 3.1 achieves higher compression efficiency, significantly reducing memory usage while maintaining temporal consistency. This optimization allows Wan Video to handle high-resolution videos more efficiently, supporting infinite-length 1080P video encoding and decoding.

2Diffusion Transformer Improvements

Wan 3.1 introduces dynamic computation allocation mechanisms that automatically adjust computing resources based on input complexity, thereby significantly improving inference efficiency. In addition, through enhanced attention mechanisms, Wan 3.1 is better able to capture complex dynamics, generating more natural and smooth Wan Video.

3Real-time Generation

Wan 3.1 introduces a streaming generation mechanism that supports continuous generation of videos of infinite length. This mechanism achieves efficient real-time generation through a "denoising queue," where new frames are added to the queue and old frames are removed.

4Quantization Optimization

Through FP8 GEMM operations and mixed 8-bit optimization, it significantly reduces memory consumption while improving performance. These optimizations make Wan AI models run more efficiently on consumer-grade devices.

Enhanced Capabilities

Prompt Alignment

Enhanced prompt alignment capabilities through Cross-Attention, supporting multilingual text effects for both Chinese and English.

Audio Generation

Stronger temporal consistency in audio generation, producing more natural audio without noticeable noise.

Consumer Device Support

Optimized 1.3B parameter lightweight model for efficient running on consumer GPUs. 5-second 480P video generation in just 4 minutes.

Multimodal Fusion

Supports multiple input signals including text, images, and videos. Suitable for advertising, education, and e-commerce.

Summary

Wan 3.1 has made comprehensive performance optimizations and technological improvements based on Wan 2.1, significantly enhancing the efficiency, quality, and multifunctionality of video generation. Whether it is efficient generation on consumer-grade devices, or the improvement of real-time video and audio generation capabilities, Wan 3.1 has brought new breakthroughs to the field of video generation. As the technology continues to improve, Wan 3.1 is expected to become a powerful tool for content creators and industry users.