Google has enhanced its artificial intelligence-powered image and video generation capabilities. The American tech giant has introduced its second-generation video generation model, Veo 2, alongside improvements to its existing Imagen 3 image-generation model, which now produces brighter and more composed images. The company has also unveiled a new experimental tool, Whisk, which allows users to stylise and remix images for unique outputs.
Google Veo 2 model
Google said that the Veo 2 model is designed to improve the understanding of real-world physics, human movement, and expression, enabling it to generate more realistic videos with finer details. Google claims that the model can process complex requests, including genre, lens type, and cinematic effects. The new model can generate videos in resolutions up to 4K, with video lengths extending to several minutes.
Veo 2 is integrated into Google Labs' video generation tool, VideoFX. Users can visit Google Labs and join the waitlist for access to the new features. Google also plans to expand Veo 2 to YouTube Shorts and other products next year.
More From This Section
Imagen 3
Google has also enhanced its Imagen 3 image-generation model, which now offers the ability to render a wider variety of art styles with greater accuracy—from photo-realism and impressionism to abstract and anime. The update also improves the model’s ability to follow input prompts more closely and produce images with more detail and texture.
Like Veo 2, the updated Imagen 3 model will be available in Google Labs, in the image-generation tool called ImageFX.
Whisk
Google’s latest experimental tool, Whisk, combines the capabilities of Imagen 3’s image generation with Gemini’s visual understanding and description capabilities. Whisk allows users to input or create images according to their preferences and remix them for unique outputs. When a user inputs an image, Gemini automatically writes a detailed caption, which is then fed into Imagen 3. This process allows the model to generate images in different styles based on the input and description.