Google has unveiled Whisk, an experimental generative AI tool under its Labs program, designed to simplify image creation by using images as prompts instead of lengthy text descriptions. This tool allows users to drag and drop images to define subjects, scenes, and styles, offering a more visual and intuitive alternative to traditional image generators. Built for "rapid visual exploration," Whisk focuses on creative experimentation rather than pixel-perfect edits.
Whisk leverages Google’s latest image generation model, Imagen 3, alongside the Gemini language model. The Gemini model automatically generates detailed captions of input images, which are then processed by Imagen 3 to produce visuals that capture the essence of the input rather than exact replicas.
While the tool extracts key characteristics from an image, it may produce results with variations in attributes like height, weight, hairstyle, or skin tone. Recognising that precision may be crucial for some projects, Whisk allows users to view and edit the underlying prompts at any time.
Currently available in the United States to users enrolled in the Google Labs program, Whisk is aimed at artists, designers, and creatives looking for new ways to explore ideas quickly. Early testers have described it as a creative tool for generating multiple visual options, rather than a traditional image editor. Users can download their favourite results and experiment further.
Whisk is part of Google’s broader commitment to advancing generative AI, following tools like Veo 2 for video generation. Google Labs serves as a platform for experimenting with technologies, inviting feedback to shape future products.