Gemini 2.0 Flash supports response generation in multiple modalities, including text, speech, and images.
Text generation
Gemini 2.0 Flash supports text generation using the Google Cloud console, REST API, and supported SDKs. For more information, see our text generation guide.
Speech generation (early access/allowlist)
Gemini 2.0 supports a new multimodal generation capability: text to speech.
Using the text-to-speech capability, you can prompt the model to generate high
quality audio output that sounds like a human voice (say "hi everyone"
), and
you can further refine the output by steering the voice.
Image generation (early access/allowlist)
Gemini 2.0 supports the ability to output text with in-line images. This lets you use Gemini to conversationally edit images or generate multimodal outputs (for example, a blog post with text and images in a single turn). Previously this would have required stringing together multiple models.
Image generation is available as a private experimental release. It supports the following modalities and capabilities:
- Text to image
- Example prompt: "Generate an image of the Eiffel tower with fireworks in the background."
- Text to image(s) and text (interleaved)
- Example prompt: "Generate an illustrated recipe for a paella. Create images to go alongside the text as you generate the recipe."
- Image(s) and text to image(s) and text (interleaved)
- Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?"
- Image editing (text and image to image)
- Example prompt: "Edit this image to make it look like a cartoon"
- Example prompt: [image of a cat] + [image of a pillow] + "Create a cross stitch of my cat on this pillow."
- Multi-turn image editing (chat)
- Example prompts: [upload an image of a blue car.] "Turn this car into a convertible." "Now change the color to yellow."
- Watermarking
- All generated images include a SynthID watermark.
Limitations:
- Generation of people and editing of uploaded images of people are not allowed.
- For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN.
- Image generation does not support audio or video inputs.
- Image generation may not always trigger:
- The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
- The model may stop generating partway through. Try again or try a different prompt.