Use Gemini to identify key moments in YouTube videos

This code sample demonstrates how to use Gemini to identify key moments in YouTube videos. It takes a YouTube video URL as input and returns a list of key moments along with their timestamps.

Code sample

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO (developer): update project id
vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

contents = [
    # Text prompt
    "Identify the key moments of this video.",
    # YouTube video of Paris 2024 Olympics
    Part.from_uri("https://www.youtube.com/watch?v=6F5gZWcpNU4", "video/mp4"),
]

response = model.generate_content(contents)
print(response.text)
# Example response
#    This video is a fast-paced, exciting montage of athletes competing in and celebrating their victories in the 2024 Summer Olympics in Paris, France. Key moments include:
#    - [00:00:01] The Olympic rings are shown with laser lights and fireworks in the opening ceremonies.
#    - [00:00:02–00:00:08] Various shots of the games’ venues are shown, including aerial views of skateboarding and volleyball venues, a view of the track and field stadium, and a shot of the Palace of Versailles.
#    - [00:00:09–00:01:16] A fast-paced montage shows highlights from various Olympic competitions.
#    - [00:01:17–00:01:29] The video switches to show athletes celebrating victories, both tears of joy and tears of sadness are shown.
#    - [00:01:30–00:02:26] The montage then continues to showcase sporting events, including cycling, kayaking, swimming, track and field, gymnastics, surfing, basketball, and ping-pong.
#    - [00:02:27–00:04:03] More athletes celebrate their wins.
#    - [00:04:04–00:04:55] More Olympic sports are shown, followed by more celebrations.
#    - [00:04:56] Olympic medals are shown.
#    - [00:04:57] An aerial shot of the Eiffel Tower lit up with the Olympic rings is shown at night.
#    - [00:04:58–00:05:05] The video ends with a black screen and the words, “Sport. And More Than Sport.” written beneath the Olympic rings.

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.