In summary
- Tencent launched Hunyuan Video, a free and open source video generator.
- The model requires 60GB of GPU memory to run locally, but several cloud services began offering access to the model.
- Initial tests showed that Hunyuan matched the quality of commercial competitors, although its understanding of English was limited.
While OpenAI continues to generate buzz with Sora after months of delays, Tencent quietly launched a model that is already showing results comparable to existing top-tier video generators.
Tencent has unveiled Hunyuan Video, a free and open source AI video generator, strategically timed during OpenAI’s 12-day announcement campaign, which is widely anticipated to include the debut of Sora, its long-awaited video tool.
“Introducing Hunyuan Video, a novel open source video base model that exhibits video generation performance comparable to, if not superior to, leading closed source models,” Tencent said in its official announcement.
More Read
The Shenzhen, China-based tech giant claims its model “outperforms” Runway Gen-3, Luma 1.6, and “three high-performance Chinese video generative models” based on professional human evaluation results.
The timing couldn’t be more opportune.
Before introducing its video generator—somewhere between the SDXL and Flux eras of open source image generators—Tencent released a similarly named image generator.
HunyuanDit provided excellent results and improved bilingual text comprehension, but was not widely adopted. The family was completed with a group of Large Language Models (LLMs).
Hunyuan Video uses a decoder-only Multimodal Large Language Model as its text encoder instead of the usual combination of CLIP and T5-XXL found in other AI video tools and image generators.
Tencent says this helps the model follow instructions better, capture image details more accurately, and learn new tasks on the fly without additional training; Additionally, your causal attention setup gets a boost from a special token refiner that helps you understand prompts more thoroughly than traditional models.
It also rewrites the prompts to make them richer and increase the quality of their generations. For example, a prompt that simply says “A man walking his dog” can be improved by including details, scene settings, lighting conditions, quality artifacts, and breed, among other elements.
Free for the masses
Like Meta’s LLaMA 3, Hunyuan is free to use and monetize until you reach 100 million users, a threshold most developers won’t have to worry about anytime soon.
The drawback? You’ll need a powerful computer with at least 60GB of GPU memory to run its 13 billion parameter model locally: think Nvidia H800 or H20 cards. That’s more vRAM than most gaming PCs have in total.
For those who don’t have a supercomputer on hand, cloud services are already coming into their own.
FAL.ai, a generative media platform designed for developers, has integrated Hunyuan, charging $0.5 per video. Other cloud providers, including Replicate or GoEhnance, have also started offering access to the model. The official Hunyuan Video server offers 150 credits for $10, with each video generation costing a minimum of 15 credits.
And of course, users can run the model on a rented GPU using services like Runpod or Vast.ai.
Initial tests show that Hunyuan matches the quality of commercial heavyweights like Luma Labs Dream Machine or Kling AI. The videos take about 15 minutes to generate, producing photorealistic sequences with natural movement of humans and animals.
The tests reveal a current weakness: the model’s understanding of English prompts may be sharper than its competitors. However, being open source means that developers can now modify and improve the model.
Tencent says its text encoder achieves alignment rates of up to 68.5% — meaning how close the output is to what users are asking for — while maintaining visual quality scores of 96.4% based on its internal tests.
The complete source code and pre-trained weights are available for download on GitHub and the Hugging Face platforms.
Edited by Sebastian Sinclair
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Crypto Keynote USA
For the Latest Crypto News, Follow ©KeynoteUSA on Twitter Or Google News.