Mulai 29 April 2025, model Gemini 1.5 Pro dan Gemini 1.5 Flash tidak tersedia di project yang belum pernah menggunakan model ini, termasuk project baru. Untuk mengetahui detailnya, lihat
Versi dan siklus proses model.
Mentranskripsikan file audio dengan Model AI Multimodal
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Contoh ini menunjukkan cara menggunakan file audio untuk membuat transkrip podcast dengan stempel waktu.
Mempelajari lebih lanjut
Untuk dokumentasi mendetail yang menyertakan contoh kode ini, lihat artikel berikut:
Contoh kode
Kecuali dinyatakan lain, konten di halaman ini dilisensikan berdasarkan Lisensi Creative Commons Attribution 4.0, sedangkan contoh kode dilisensikan berdasarkan Lisensi Apache 2.0. Untuk mengetahui informasi selengkapnya, lihat Kebijakan Situs Google Developers. Java adalah merek dagang terdaftar dari Oracle dan/atau afiliasinya.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],[],[],[],null,["# Transcript an audio file with Multimodal AI Model\n\nThis sample shows you how to use an audio file to generate a podcast transcript with timestamps.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Audio understanding (speech only)](/vertex-ai/generative-ai/docs/multimodal/audio-understanding)\n\nCode sample\n-----------\n\n### Go\n\n\nBefore trying this sample, follow the Go setup instructions in the\n[Vertex AI quickstart using\nclient libraries](/vertex-ai/docs/start/client-libraries).\n\n\nFor more information, see the\n[Vertex AI Go API\nreference documentation](/go/docs/reference/cloud.google.com/go/aiplatform/latest/apiv1).\n\n\nTo authenticate to Vertex AI, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import (\n \t\"context\"\n \t\"fmt\"\n \t\"io\"\n\n \tgenai \"google.golang.org/genai\"\n )\n\n // generateAudioTranscript shows how to generate an audio transcript.\n func generateAudioTranscript(w io.Writer) error {\n \tctx := context.Background()\n\n \tclient, err := genai.NewClient(ctx, &genai.ClientConfig{\n \t\tHTTPOptions: genai.HTTPOptions{APIVersion: \"v1\"},\n \t})\n \tif err != nil {\n \t\treturn fmt.Errorf(\"failed to create genai client: %w\", err)\n \t}\n\n \tmodelName := \"gemini-2.5-flash\"\n \tcontents := []*genai.Content{\n \t\t{Parts: []*genai.Part{\n \t\t\t{Text: `Transcribe the interview, in the format of timecode, speaker, caption.\n Use speaker A, speaker B, etc. to identify speakers.`},\n \t\t\t{FileData: &genai.FileData{\n \t\t\t\tFileURI: \"gs://cloud-samples-data/generative-ai/audio/pixel.mp3\",\n \t\t\t\tMIMEType: \"audio/mpeg\",\n \t\t\t}},\n \t\t},\n \t\t\tRole: \"user\"},\n \t}\n\n \tresp, err := client.Models.GenerateContent(ctx, modelName, contents, nil)\n \tif err != nil {\n \t\treturn fmt.Errorf(\"failed to generate content: %w\", err)\n \t}\n\n \trespText := resp.Text()\n\n \tfmt.Fprintln(w, respText)\n\n \t// Example response:\n \t// 00:00:00, A: your devices are getting better over time.\n \t// 00:01:13, A: And so we think about it across the entire portfolio from phones to watch, ...\n \t// ...\n\n \treturn nil\n }\n\n### Python\n\n\nBefore trying this sample, follow the Python setup instructions in the\n[Vertex AI quickstart using\nclient libraries](/vertex-ai/docs/start/client-libraries).\n\n\nFor more information, see the\n[Vertex AI Python API\nreference documentation](/python/docs/reference/aiplatform/latest).\n\n\nTo authenticate to Vertex AI, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google import genai\n from google.genai.types import GenerateContentConfig, HttpOptions, Part\n\n client = genai.Client(http_options=HttpOptions(api_version=\"v1\"))\n prompt = \"\"\"\n Transcribe the interview, in the format of timecode, speaker, caption.\n Use speaker A, speaker B, etc. to identify speakers.\n \"\"\"\n response = client.models.generate_content(\n model=\"gemini-2.5-flash\",\n contents=[\n prompt,\n Part.from_uri(\n file_uri=\"gs://cloud-samples-data/generative-ai/audio/pixel.mp3\",\n mime_type=\"audio/mpeg\",\n ),\n ],\n # Required to enable timestamp understanding for audio-only files\n config=GenerateContentConfig(audio_timestamp=True),\n )\n print(response.text)\n # Example response:\n # [00:00:00] **Speaker A:** your devices are getting better over time. And so ...\n # [00:00:14] **Speaker B:** Welcome to the Made by Google podcast where we meet ...\n # [00:00:20] **Speaker B:** Here's your host, Rasheed Finch.\n # [00:00:23] **Speaker C:** Today we're talking to Aisha Sharif and DeCarlos Love. ...\n # ...\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=googlegenaisdk)."]]