Alibaba’s Powerful AI Video Generator is Now Accessible to All

On the back of its $53bn investment commitment, Alibaba shows China will be a major player on the global AI stage

Generative AI Insights

Published: February 27, 2025

Luke Williams

Alibaba has raised the stakes in the global AI race by open-sourcing Wan 2.1; a text-to-video AI system that industry observers describe as “unbelievable” in its quality and capabilities. The move is another significant marker in China’s growing influence in frontier AI technologies, coming amid a wave of impressive AI innovations from Chinese companies including DeepSeek and ByteDance.

These developments are backed by massive investment. Alibaba recently announced plans to commit approximately $53 billion to cloud computing and AI infrastructure over the next three years—exceeding its total AI and cloud spending over the past decade. During a recent earnings call, Alibaba CEO Eddie Wu described AI as a “once-in-a-generation” opportunity, with Artificial General Intelligence (AGI) as the company’s primary long-term objective.

“A Big Deal” for AI Innovation

Feedback on Wan has been overwhelmingly positive, and has praised the system’s ability to create realistic physics simulations, complex motion, and cinematic-quality visuals. What makes Wan 2.1 particularly notable is that it currently tops the VBench leaderboard for video generation performance, outperforming many closed-source commercial alternatives.

By making this technology freely available to developers worldwide, Alibaba has positioned itself as a direct challenger to OpenAI’s Sora video generation tool, but with a distinctly different approach to market development. While Western AI companies have generally kept their most advanced models behind API access and paywalls, Chinese companies like Alibaba are increasingly open-sourcing sophisticated AI technology, potentially accelerating global adoption.

Stills from AI-generated videos, taken from the Wan AI website.

Social media commentator Amrit Raj emphasised the significance of Alibaba’s decision to make Wan 2.1 open-source:

Unlike many other AI video models, Wan 2.1 is open-source, meaning anyone—developers, creators, and researchers—can access, modify, and integrate it into their projects. And that’s a BIG DEAL because open-source models fuel faster innovation and community-driven improvements.

According to Raj, Wan 2.1 outperforms OpenAI’s Sora in several critical areas: “It’s been benchmarked on VBench, where it outperformed even OpenAI’s Sora in motion smoothness, subject consistency, and multi-object interaction.” These performance advantages, combined with the model’s accessibility, may significantly influence the competitive landscape for AI video generation.

The technical architecture gives Wan 2.1 key advantages. “Wan 2.1 uses a 3D Causal Variational Autoencoder (VAE) that enables faster video generation—2.5× quicker than some competitors—and smoother motion with fewer glitches,” Raj noted. The model also employs a “Diffusion Transformer Framework” that “uses a T5 encoder to create hyper-realistic visuals, handling complex movements and detailed environments with ease

Breakthrough Technology Made Freely Available

According to Alibaba’s Wednesday announcement, the company has released four models from its Tongyi Wanxiang (Wan) video foundation model family. These include two powerful 14-billion parameter versions for high-quality generation, and a more lightweight 1.3-billion parameter model designed to run on consumer-grade hardware.

The system’s capabilities are impressive by any standard. Wan 2.1 excels at creating realistic videos featuring extensive body movements, complex rotations, dynamic scene transitions, and fluid camera motions. It can generate videos that accurately simulate real-world physics and realistic object interactions, while also supporting “cinematic quality” output with rich textures and stylised effects.

Unlike many other advanced AI models, Wan 2.1 is designed with accessibility in mind. The lightweight T2V-1.3B model can generate a 5-second 480P video in about 4 minutes on a consumer-grade RTX 4090 GPU, making sophisticated video generation available to a much wider audience than previous systems required.

Alibaba Embodies China’s Growing AI Ambitions

Alibaba’s move comes amid increasing evidence of China’s determination to be at the forefront of AI innovation. In recent months, ByteDance unveiled its impressive OmniHuman-1 AI for realistic human animation, while DeepSeek emerged as a serious challenger to OpenAI’s language models. Chinese AR device maker Rokid has also recently demonstrated practical AI glasses that integrate with Alibaba’s Qwen large language models.

“The AI era presents a clear and massive demand for infrastructure. We will aggressively invest in AI infrastructure,” Wu stated, signalling the company’s strategic prioritisation of AI development.

Implications for the Future of AI

Industry experts suggest that open-source models like Wan 2.1 could significantly reshape the AI landscape. Kevin O’Donovan, a technology advisor focused on industrial AI, noted in a recent interview that businesses are increasingly looking for specific, practical applications rather than theoretical possibilities:

I think there’s a bit of fatigue setting in with a lot of people… I think where people are finding real value is when someone can go ‘here is a specific use case, here’s a specific challenge, and here’s how we fixed it.

The democratisation of powerful AI tools through open-sourcing may be a key factor in determining market share going forward. By making Wan 2.1 freely available, Alibaba potentially sacrifices immediate revenue but could gain wider adoption and ecosystem influence.

As video generation becomes more accessible, businesses that can demonstrate practical applications and integrate these tools into existing workflows may find competitive advantages. Rather than focusing on hypothetical capabilities, successful AI implementations will likely centre on solving specific business problems.

This points to a potential shift in the AI market where the ability to apply technology to practical use cases may become more valuable than proprietary technology itself—potentially reshaping not just who leads in AI development, but how that leadership is defined in an increasingly competitive global landscape.

The headline image was taken from Wan AI promotional footage

Investments Natural Language Processing