After Microsoft’s Copilot artificial intelligence gained the ability to generate audio clips from text prompts, Google introduced VideoPoet, a large language model (LLM) that pushes the boundaries of video creation by creating 10-second clips with fewer artifacts. The model supports a range of video creation tasks, including text-to-video, image-to-video, video styling, coloring, and video-to-analogue.
It generates 10-second video clips from text prompts and can also animate still images
Unlike its predecessors, the VideoPoet stands out because it can create complete videos with a lot of movement. The model demonstrates its skills by creating ten-second videos, leaving its competitors, including Gen-2, behind. Notably, VideoPoet does not rely on specific data to create videos, which sets it apart from other models that require detailed input to achieve optimal results.
This multifaceted capability is made possible by the use of a multimodal grand model, putting it on a trajectory that could potentially become mainstream in video creation.
VideoPOET by Google breaks from the dominant trend in video creation models that predominantly rely on diffusion-based approaches. Instead, VideoPoet leverages the power of large-scale language models (LLMs). The model easily integrates different video generation tasks within a single LLM, eliminating the need for separately trained components for each function.
The resulting videos can be of different lengths, with a variety of actions and styles based on the input text content. In addition, VideoPoet can convert input images into animations based on the provided hints, demonstrating its adaptability to different inputs.
The release of VideoPOET adds a new dimension to AI video creation, hinting at the possibilities that lie ahead in 2024.