Stability AI is Testing Generative Video which Generates up to 25 Frames from a Still Image

By Naorem MohenPublished December 4, 20233 min read0 reads

Stability AI, the developer behind Stable Diffusion’s generative art is testing generative video. The company has developed the ability to animate its generative art. The firm has introduced a research preview of a new offering named Stable Video Diffusion, which enables users to generate videos from a single image. Stability AI expressed that this cutting-edge AI video model is a substantial advancement in their mission to develop models that cater to all users.

The newly launched tool comes in the form of two image-to-video models. Each of these models can generate videos that are 14 to 25 frames long, with frame rates ranging from 3 to 30 frames per second and a resolution of 576 × 1024.

The generative video tool is capable of synthesizing multi-view images from a single frame and can be fine-tuned on multi-view datasets. Upon its initial release, external evaluations have shown that these models outperform leading closed models in user preference studies, according to the company.

This comparison was made with text-to-video platforms such as Runway and Pika Labs. Currently, Stable Video Diffusion is only available for research purposes and not for real-world or commercial applications.

However, interested users can join a waitlist for access to a forthcoming web experience that features a text-to-video interface. This new generative video tool, Stable Video Diffusion is expected to demonstrate its potential uses in various sectors, including advertising, education, entertainment, and more.

Stability AI’s Stable Video Diffusion operates on diffusion models, transforming training images into noise images through a process called Forward Diffusion. The Reverse Diffusion process then uses the trained model to generate an image from this noise. Unlike other models, Stable Diffusion works in a reduced-definition latent space, not the pixel space of the image, making it more efficient for consumer-grade graphics cards.

It uses a latent diffusion model to generate AI images from text, compressing the image into the latent space. The model is trained on a large dataset of videos and fine-tuned on a smaller set. Currently, Stable Video Diffusion is only available for research purposes, but it has shown potential in generating high-quality images and videos.

The video samples provided appear to be of relatively high quality, comparable to other generative systems. However, the company acknowledges some limitations: the generated videos are relatively short (less than 4 seconds), lack perfect photorealism, are unable to perform camera motion except for slow pans, have no text control, cannot generate legible text, and may not properly generate people and faces.

The tool was trained on a dataset comprising millions of videos and subsequently fine-tuned on a smaller set. Stability AI has stated that the generated videos used for this purpose were publicly available for research.

About the author

Naorem Mohen

Editor

Naorem Mohen is a journalist, writer and Editor of Signpost News. He writes on Manipur and Northeast India, focusing on governance, society, education, conflict, culture and regional affairs. He is also the author of "In the Lap of Koubru", a book reflecting his engagement with the people, history, culture and identity during 2023-2025. Follow him on X at @laimacha.

Covers: Geo-politics, Artificial Intelligence, Education, Cryptocurrencies.

View profile and articles → Social profile ↗

Stability AI is Testing Generative Video which Generates up to 25 Frames from a Still Image

Like this:

Naorem Mohen

4 responses

What's your View?Cancel reply

Share this story:

Like this:

Naorem Mohen

Related Stories

Meta Muse Image Lets Users Mention Public Instagram Profiles in AI Prompts: How to Opt Out

An Opportunity for Indian IT Sector as Open AI and Anthropic Enters AI Service

India and Japan Hold First AI Strategic Dialogue in Mumbai

4 responses

What's your View?Cancel reply