Google’s Whisk AI uses images as hints

0
123
Google's Whisk AI uses images as hints

Google has another AI tool to add to the pile. Whisk is an image generator from Google Labs that allows you to use an existing image as a clue. However, it only captures the “essence” of your original image, rather than recreating it with new details. So, it is more suitable for brainstorming and quick visualizations than for editing the original image.

The company describes Whisk as “a new type of creative tool”. The input screen starts with a simple interface where you can choose a style and theme. This simple introductory interface allows you to choose only one of three predefined styles: sticker, enamel pin, and plush pattern. I suspect that Google decided that these three styles allow for rough sketches, for which the experimental tool in its current form is most ideal.

As you can see in the image above, it created a solid image of Wilford Brimley’s teddy bear. (Google’s rules prohibit images of celebrities, but Wilford slipped through the gate with a Quaker Oats in tow without alerting security.)

Whisk also includes a more advanced editor (found by clicking “Start from scratch” on the home screen). In this mode, you can use text or a source image in three categories: theme, scene, and style. There is also an input line to add more text for finishing touches. However, in its current form, the advanced controls didn’t produce results that met my needs.

For example, take a look at my attempt to generate the late Mr. Brimley in a lightbox scene of a plush walrus I found online:

Новий інструмент ШІ Google Whisk використовує зображення як підказки

A whisk spits out what looks like a Wilford Brimley-like actor eating oatmeal in a lightbox frame. As far as I can tell, this guy is not a plushie. So it’s understandable why Google recommends using the tool more for “quick visual exploration” and less for production-ready content.

Google admits that Whisk only uses “a few key characteristics” of your source image. “For example, the generated object may have a different height, weight, hairstyle, or skin tone,” the company warns.

To understand why, just look at Google’s description of how Whisk works under the hood. It uses the Gemini language model to write a detailed description of the source image you upload. It then loads this description into the Imagen 3 image generator. Thus, the result is an image based on what Gemini says about your image, not the original image itself.

Whisk is only available in the US, at least for now. You can try it on the Google Labs project website.

LEAVE A REPLY

Please enter your comment!
Please enter your name here