Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Gemini 2.0 Flash Thinking is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model.

Use Flash Thinking

Flash Thinking models are available as an experimental model in Vertex AI. To use the latest Flash Thinking model, select the gemini-2.0-flash-thinking-exp-01-21 model in the Model drop-down menu.

Thoughts

The model's thinking process is returned as the first element of the content.parts list that is created when the model generates the response. For example, the following code prints out only the model's thinking process:

response = client.models.generate_content(
    model='gemini-2.0-flash-thinking-exp-01-21',
    contents='Solve 3*x^3-5*x=1',
    config={'thinking_config': {'include_thoughts': True}}
)

Markdown(response.candidates[0].content.parts[0].text)

You can see more examples of how to use Flash Thinking in our Colab notebook.

Limitations

Flash Thinking is an experimental model and has the following limitations:

  • 1M token input limit
  • Text and image input only
  • 64k token output limit
  • Text only output
  • No built-in tool usage like Search or code execution

What's next?

Try Flash Thinking for yourself with our Colab notebook, or open the Vertex AI console and try prompting the model for yourself.