このページは Cloud Translation API によって翻訳されました。

音声理解（音声のみ）

Gemini リクエストに音声を追加して、含まれる音声の内容を理解するタスクを実行できます。このページでは、Google Cloud コンソールと Vertex AI API を使用して、Vertex AI の Gemini へのリクエストに音声を追加する方法について説明します。

サポートされているモデル

次の表に、音声理解をサポートするモデルを示します。

モデル	音声モダリティの詳細	モデルを試す
Gemini 2.0 Flash `gemini-2.0-flash-001`	プロンプトあたりの音声の最大長: 約 8.4 時間、または最大 100 万トークンプロンプトあたりの音声ファイルの最大数: 1 音声の理解: 音声の要約、文字起こし、翻訳	Gemini 2.0 Flash を試す
Gemini 1.5 Flash `gemini-1.5-flash`	プロンプトあたりの音声の最大長: 約 8.4 時間、または最大 100 万トークンプロンプトあたりの音声ファイルの最大数: 1 音声の要約、文字起こし、翻訳のための音声理解	Gemini 1.5 Flash を試す
Gemini 1.5 Pro `gemini-1.5-pro`	プロンプトあたりの音声の最大長: 約 8.4 時間、または最大 100 万トークンプロンプトあたりの音声ファイルの最大数: 1 音声の要約、文字起こし、翻訳のための音声理解	Gemini 1.5 Pro を試す

Gemini モデルでサポートされている言語の一覧については、モデル情報の Google モデルをご覧ください。マルチモーダルプロンプトの設計方法について詳しくは、マルチモーダルプロンプトを設計するをご覧ください。モバイルアプリやウェブアプリから Gemini を直接使用することをお考えの場合は、Android、Swift、ウェブ、Flutter アプリの Vertex AI in Firebase SDK をご覧ください。

リクエストに音声を追加する

Gemini へのリクエストに音声ファイルを追加できます。

単一の音声

以下で、音声ファイルを使用してポッドキャストを要約する方法について説明します。

Gen AI SDK for Python

Google Gen AI SDK for Python のインストールまたは更新方法を確認する。
詳細については、 Gen AI SDK for Python API リファレンスドキュメントまたは python-genai GitHub リポジトリをご覧ください。
Vertex AI で Gen AI SDK を使用するように環境変数を設定します。

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))
prompt = """
Provide a concise summary of the main points in the audio file.
"""
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=[
        prompt,
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
            mime_type="audio/mpeg",
        ),
    ],
)
print(response.text)
# Example response:
# Here's a summary of the main points from the audio file:

# The Made by Google podcast discusses the Pixel feature drops with product managers Aisha Sheriff and De Carlos Love.  The key idea is that devices should improve over time, with a connected experience across phones, watches, earbuds, and tablets.

Vertex AI SDK for Python

Vertex AI SDK for Python のインストールまたは更新方法については、Vertex AI SDK for Python をインストールするをご覧ください。詳細については、Vertex AI SDK for Python API リファレンスドキュメントをご覧ください。

ストリーミングレスポンスと非ストリーミングレスポンス

モデルがストリーミングレスポンスを生成するのか、非ストリーミングレスポンスを生成するのかについては、選択が可能です。ストリーミングレスポンスの場合、出力トークンが生成されるとすぐに各レスポンスが返されます。非ストリーミングレスポンスの場合、すべての出力トークンが生成された後にすべてのレスポンスが返されます。

ストリーミングレスポンスの場合は、generate_content で stream パラメータを使用します。

  response = model.generate_content(contents=[...], stream = True)

非ストリーミングレスポンスの場合は、パラメータを削除するか、パラメータを False に設定します。

サンプルコード


import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

prompt = """
Please provide a summary for the audio.
Provide chapter titles, be concise and short, no need to provide chapter summaries.
Do not make up any information that is not part of the audio and do not be verbose.
"""

audio_file_uri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"
audio_file = Part.from_uri(audio_file_uri, mime_type="audio/mpeg")

contents = [audio_file, prompt]

response = model.generate_content(contents)
print(response.text)
# Example response:
# **Made By Google Podcast Summary**
# **Chapter Titles:**
# * Introduction
# * Transformative Pixel Features
# ...

Java

このサンプルを試す前に、Vertex AI クイックスタートの Java の設定手順を実施してください。詳細については、Vertex AI Java SDK for Gemini リファレンスドキュメントをご覧ください。

Vertex AI に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の ADC を設定するをご覧ください。

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、generateContentStream メソッドを使用します。

  public ResponseStream<GenerateContentResponse> generateContentStream(Content content)

非ストリーミングレスポンスの場合は、generateContent メソッドを使用します。

  public GenerateContentResponse generateContent(Content content)

サンプルコード

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ContentMaker;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.PartMaker;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.io.IOException;

public class AudioInputSummarization {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.5-flash-001";

    summarizeAudio(projectId, location, modelName);
  }

  // Analyzes the given audio input.
  public static String summarizeAudio(String projectId, String location, String modelName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      String audioUri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3";

      GenerativeModel model = new GenerativeModel(modelName, vertexAI);
      GenerateContentResponse response = model.generateContent(
          ContentMaker.fromMultiModalData(
              "Please provide a summary for the audio.\n"
                  + "Provide chapter titles with timestamps, be concise and short, "
                  + "no need to provide chapter summaries.\n"
                  + "Do not make up any information that is not part of the audio "
                  + "and do not be verbose.",
              PartMaker.fromMimeTypeAndData("audio/mp3", audioUri)
          ));

      String output = ResponseHandler.getText(response);
      System.out.println(output);

      return output;
    }
  }
}

Node.js

このサンプルを試す前に、Node.js SDK を使用した生成 AI クイックスタートの Node.js の設定手順を実施してください。詳細については、Node.js SDK for Gemini リファレンスドキュメントをご覧ください。

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、generateContentStream メソッドを使用します。

  const streamingResp = await generativeModel.generateContentStream(request);

非ストリーミングレスポンスの場合は、generateContent メソッドを使用します。

  const streamingResp = await generativeModel.generateContent(request);

サンプルコード

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function summarize_audio(projectId = 'PROJECT_ID') {
  const vertexAI = new VertexAI({project: projectId, location: 'us-central1'});

  const generativeModel = vertexAI.getGenerativeModel({
    model: 'gemini-1.5-flash-001',
  });

  const filePart = {
    file_data: {
      file_uri: 'gs://cloud-samples-data/generative-ai/audio/pixel.mp3',
      mime_type: 'audio/mpeg',
    },
  };
  const textPart = {
    text: `
    Please provide a summary for the audio.
    Provide chapter titles with timestamps, be concise and short, no need to provide chapter summaries.
    Do not make up any information that is not part of the audio and do not be verbose.`,
  };

  const request = {
    contents: [{role: 'user', parts: [filePart, textPart]}],
  };

  const resp = await generativeModel.generateContent(request);
  const contentResponse = await resp.response;
  console.log(JSON.stringify(contentResponse));
}

Go

このサンプルを試す前に、Vertex AI クイックスタートの Go の設定手順を実施してください。詳細については、Vertex AI Go SDK for Gemini リファレンスドキュメントをご覧ください。

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、GenerateContentStream メソッドを使用します。

  iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))

非ストリーミングレスポンスの場合は、GenerateContent メソッドを使用します。

  resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))

サンプルコード

import (
	"context"
	"errors"
	"fmt"
	"io"
	"mime"
	"path/filepath"

	"cloud.google.com/go/vertexai/genai"
)

// summarizeAudio shows how to send an audio asset and a text question to a model, writing the response to the
// provided io.Writer.
func summarizeAudio(w io.Writer, projectID, location, modelName string) error {
	// location := "us-central1"
	// modelName := "gemini-1.5-flash-001"
	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		return fmt.Errorf("unable to create client: %w", err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)
	model.SetTemperature(0.4)

	// Given an audio file URL, prepare audio file as genai.Part
	part := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext("pixel.mp3")),
		FileURI:  "gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
	}

	res, err := model.GenerateContent(ctx, part, genai.Text(`
		Please provide a summary for the audio.
		Provide chapter titles with timestamps, be concise and short, no need to provide chapter summaries.
		Do not make up any information that is not part of the audio and do not be verbose.
	`,
	))
	if err != nil {
		return fmt.Errorf("unable to generate contents: %w", err)
	}

	if len(res.Candidates) == 0 ||
		len(res.Candidates[0].Content.Parts) == 0 {
		return errors.New("empty response from model")
	}

	fmt.Fprintf(w, "generated summary:\n%s\n", res.Candidates[0].Content.Parts[0])
	return nil
}

C#

このサンプルを試す前に、Vertex AI クイックスタートの C# の設定手順を実施してください。詳細については、Vertex AI C# のリファレンスドキュメントをご覧ください。

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、StreamGenerateContent メソッドを使用します。

  public virtual PredictionServiceClient.StreamGenerateContentStream StreamGenerateContent(GenerateContentRequest request)

非ストリーミングレスポンスの場合は、GenerateContentAsync メソッドを使用します。

  public virtual Task<GenerateContentResponse> GenerateContentAsync(GenerateContentRequest request)

サーバーがレスポンスをストリーミングする方法の詳細については、ストリーミング RPC をご覧ください。

サンプルコード


using Google.Cloud.AIPlatform.V1;
using System;
using System.Threading.Tasks;

public class AudioInputSummarization
{
    public async Task<string> SummarizeAudio(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.5-flash-001")
    {
        var predictionServiceClient = new PredictionServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();

        string prompt = @"Please provide a summary for the audio.
Provide chapter titles with timestamps, be concise and short, no need to provide chapter summaries.
Do not make up any information that is not part of the audio and do not be verbose.";

        var generateContentRequest = new GenerateContentRequest
        {
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Contents =
            {
                new Content
                {
                    Role = "USER",
                    Parts =
                    {
                        new Part { Text = prompt },
                        new Part { FileData = new() { MimeType = "audio/mp3", FileUri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }
                    }
                }
            }
        };

        GenerateContentResponse response = await predictionServiceClient.GenerateContentAsync(generateContentRequest);

        string responseText = response.Candidates[0].Content.Parts[0].Text;
        Console.WriteLine(responseText);

        return responseText;
    }
}

REST

環境をセットアップしたら、REST を使用してテキストプロンプトをテストできます。次のサンプルは、パブリッシャーモデルのエンドポイントにリクエストを送信します。

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION: リクエストを処理するリージョン。サポートされているリージョンを入力します。サポートされているリージョンの一覧については、利用可能なロケーションをご覧ください。
クリックして、利用可能なリージョンの一部を開く
- us-central1
- us-west4
- northamerica-northeast1
- us-east4
- us-west1
- asia-northeast3
- asia-southeast1
- asia-northeast1
PROJECT_ID: 実際のプロジェクト ID。
FILE_URI: プロンプトに含めるファイルの URI または URL。指定できる値は以下のとおりです。
- Cloud Storage バケット URI: オブジェクトは一般公開されているか、リクエストを送信するプロジェクトと同じ Google Cloud プロジェクトに存在している必要があります。gemini-1.5-pro と gemini-1.5-flash の場合、サイズの上限は 2 GB です。gemini-1.0-pro-vision の場合、サイズの上限は 20 MB です。
- HTTP URL: ファイルの URL は一般公開されている必要があります。リクエストごとに 1 つの動画ファイル、1 つの音声ファイル、最大 10 個の画像ファイルを指定できます。音声ファイル、動画ファイル、ドキュメントのサイズは 15 MB 以下にする必要があります。
- YouTube 動画の URL: YouTube 動画は、Google Cloud コンソールのログインに使用したアカウントが所有しているか、公開されている必要があります。リクエストごとにサポートされる YouTube 動画の URL は 1 つだけです。
fileURI を指定する場合は、ファイルのメディアタイプ（mimeType）も指定する必要があります。VPC Service Controls が有効になっている場合、fileURI のメディアファイル URL の指定はサポートされていません。

Cloud Storage に音声ファイルがない場合は、MIME タイプが audio/mp3 の一般公開ファイル gs://cloud-samples-data/generative-ai/audio/pixel.mp3 を使用できます。この音声を聴くには、サンプル MP3 ファイルを開きます。
MIME_TYPE: data フィールドまたは fileUri フィールドで指定されたファイルのメディアタイプ。指定できる値は次のとおりです。
クリックして MIME タイプを開く
- application/pdf
- audio/mpeg
- audio/mp3
- audio/wav
- image/png
- image/jpeg
- image/webp
- text/plain
- video/mov
- video/mpeg
- video/mp4
- video/mpg
- video/avi
- video/wmv
- video/mpegps
- video/flv
```
TEXT
```
プロンプトに含める指示のテキスト。例: Please provide a summary for the audio. Provide chapter titles, be concise and short, no need to provide chapter summaries. Do not make up any information that is not part of the audio and do not be verbose.。

リクエストを送信するには、次のいずれかのオプションを選択します。

curl

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存します。ターミナルで次のコマンドを実行して、このファイルを現在のディレクトリに作成または上書きします。

cat > request.json << 'EOF'
{
  "contents": {
    "role": "USER",
    "parts": [
      {
        "fileData": {
          "fileUri": "FILE_URI",
          "mimeType": "MIME_TYPE"
        }
      },
      {
        "text": "TEXT"
      }
    ]
  }
}
EOF

その後、次のコマンドを実行して REST リクエストを送信します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-flash:generateContent"

PowerShell

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

@'
{
  "contents": {
    "role": "USER",
    "parts": [
      {
        "fileData": {
          "fileUri": "FILE_URI",
          "mimeType": "MIME_TYPE"
        }
      },
      {
        "text": "TEXT"
      }
    ]
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

その後、次のコマンドを実行して REST リクエストを送信します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-flash:generateContent" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "## Made By Google Podcast - Pixel Feature Drops \n\n**Chapter 1: Transformative Pixel Features**\n\n**Chapter 2: Importance of Feature Drops**\n\n**Chapter 3: January's Feature Drop Highlights**\n\n**Chapter 4: March's Feature Drop Highlights for Pixel Watch**\n\n**Chapter 5: March's Feature Drop Highlights for Pixel Phones**\n\n**Chapter 6: Feature Drop Expansion to Other Devices**\n\n**Chapter 7: Deciding Which Features to Include in Feature Drops**\n\n**Chapter 8: Importance of User Feedback**\n\n**Chapter 9: When to Expect March's Feature Drop**\n\n**Chapter 10: Stand-Out Features from Past Feature Drops** \n"
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.05470151,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.07864238
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.027742893,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.050051305
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.08678674,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06108711
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.11899801,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.14706452
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 18883,
    "candidatesTokenCount": 150,
    "totalTokenCount": 19033
  }
}

このサンプルの URL にある次の点に注意してください。

generateContent メソッドを使用して、レスポンスが完全に生成された後に返されるようにリクエストします。ユーザーが認識するレイテンシを短縮するには、streamGenerateContent メソッドを使用して、生成時にレスポンスをストリーミングします。
マルチモーダルモデル ID は、URL の末尾のメソッドの前に配置されます（例: gemini-1.5-flash、gemini-1.0-pro-vision）。このサンプルでは、他のモデルもサポートされている場合があります。

コンソール

Google Cloud コンソールでマルチモーダルプロンプトを送信する手順は次のとおりです。

Google Cloud コンソールの [Vertex AI] セクションで、[Vertex AI Studio] ページに移動します。

Vertex AI Studio に移動
[自由形式を開く] をクリックします。
省略可: モデルとパラメータを構成します。
- Model: モデルを選択します。
- リージョン: 使用するリージョンを選択します。
- 温度: スライダーまたはテキストボックスを使用して、温度の値を入力します。
  
  温度は、レスポンス生成時のサンプリングに使用されます。レスポンス生成は、topP と topK が適用された場合に発生します。温度は、トークン選択のランダム性の度合いを制御します。温度が低いほど、確定的で自由度や創造性を抑えたレスポンスが求められるプロンプトに適しています。一方、温度が高いと、より多様で創造的な結果を導くことができます。温度が 0 の場合、確率が最も高いトークンが常に選択されます。この場合、特定のプロンプトに対するレスポンスはほとんど確定的ですが、わずかに変動する可能性は残ります。
  モデルが返すレスポンスが一般的すぎる、短すぎる、あるいはフォールバック（代替）レスポンスが返ってくる場合は、温度を高く設定してみてください。
- 出力トークンの上限: スライダーまたはテキストボックスを使用して、最大出力の上限値を入力します。
  
  レスポンスで生成できるトークンの最大数。1 トークンは約 4 文字です。100 トークンは約 60～80 語に相当します。
  レスポンスを短くしたい場合は小さい値を、長くしたい場合は大きい値を指定します。
- 停止シーケンスを追加: 省略可。停止シーケンスを入力します。これはスペースを含む一連の文字列です。モデルが停止シーケンスに遭遇すると、レスポンスの生成が停止します。停止シーケンスはレスポンスには含まれません。停止シーケンスは 5 つまで追加できます。
省略可: 詳細パラメータを構成するには、[詳細] をクリックして、次のように構成します。
クリックして [高度な構成] を開く
- Top-K: スライダーまたはテキストボックスを使用して、Top-K の値を入力します（Gemini 1.5 ではサポートされていません）。
  Top-K は、モデルが出力用にトークンを選択する方法を変更します。Top-K が 1 の場合、次に選択されるトークンは、モデルの語彙内のすべてのトークンで最も確率の高いものであることになります（グリーディデコードとも呼ばれます）。Top-K が 3 の場合は、最も確率が高い上位 3 つのトークンから次のトークン選択されることになります（温度を使用します）。
  トークン選択のそれぞれのステップで、最も高い確率を持つ Top-K のトークンがサンプリングされます。その後、トークンはトップ P に基づいてさらにフィルタリングされ、最終的なトークンは温度サンプリングを用いて選択されます。
  
  ランダムなレスポンスを減らしたい場合は小さい値を、ランダムなレスポンスを増やしたい場合は大きい値を指定します。
- トップ P: スライダーまたはテキストボックスを使用して、トップ P の値を入力します。確率の合計が Top-P の値と等しくなるまで、最も確率が高いものから最も確率が低いものの順に、トークンが選択されます。結果を最小にするには、Top-P を 0 に設定します。
- 最大レスポンス数: スライダーまたはテキストボックスを使用して、生成するレスポンスの数の値を入力します。
- ストリーミングレスポンス: 有効にすると、レスポンスが生成されたときに出力されます。
- 安全フィルタのしきい値: 有害なおそれのあるレスポンスが表示される可能性のしきい値を選択します。
- グラウンディングを有効にする: マルチモーダルプロンプトでは、グラウンティングはサポートされていません。
[メディアを挿入] をクリックし、ファイルのソースを選択します。
アップロード
アップロードするファイルを選択して [開く] をクリックします。

URL
使用するファイルの URL を入力し、[挿入] をクリックします。

Cloud Storage
バケットを選択してから、バケット内のインポートするファイルを選択し、[選択] をクリックします。
Google ドライブ
1. このオプションを初めて選択するときに、アカウントを選択して Vertex AI Studio がアカウントにアクセスできるように同意します。合計サイズが最大 10 MB の複数のファイルをアップロードできます。1 つのファイルのサイズが 7 MB を超えないようにしてください。
2. 追加するファイルをクリックします。
3. [選択] をクリックします。
  
  ファイルのサムネイルが [プロンプト] ペインに表示されます。トークンの合計数も表示されます。プロンプトデータがトークンの上限を超えると、トークンは切り捨てられ、データの処理には含まれません。
[プロンプト] ペインにテキストプロンプトを入力します。
省略可: [テキストのトークン ID] と [トークン ID] を表示するには、[プロンプト] ペインで [トークン数] をクリックします。
注: メディアトークンはサポートされていません。
[送信] をクリックします。
省略可: プロンプトを [マイプロンプト] に保存するには、[ 保存] をクリックします。
省略可: プロンプトの Python コードまたは curl コマンドを取得するには、[コードを取得] をクリックします。

音声文字起こし

以下では、音声ファイルを使用してインタビューを文字に変換する方法について説明します。音声のみのファイルでタイムスタンプの認識を有効にするには、GenerationConfig で audioTimestamp パラメータを有効にします。

Gen AI SDK for Python

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=us-central1
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import GenerateContentConfig, HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))
prompt = """
Transcribe the interview, in the format of timecode, speaker, caption.
Use speaker A, speaker B, etc. to identify speakers.
"""
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=[
        prompt,
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
            mime_type="audio/mpeg",
        ),
    ],
    # Required to enable timestamp understanding for audio-only files
    config=GenerateContentConfig(audio_timestamp=True),
)
print(response.text)
# Example response:
# [00:00:00] **Speaker A:** your devices are getting better over time. And so ...
# [00:00:14] **Speaker B:** Welcome to the Made by Google podcast where we meet ...
# [00:00:20] **Speaker B:** Here's your host, Rasheed Finch.
# [00:00:23] **Speaker C:** Today we're talking to Aisha Sharif and DeCarlos Love. ...
# ...

Vertex AI SDK for Python

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、generate_content で stream パラメータを使用します。

  response = model.generate_content(contents=[...], stream = True)

非ストリーミングレスポンスの場合は、パラメータを削除するか、パラメータを False に設定します。

サンプルコード


import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig, Part

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

prompt = """
Can you transcribe this interview, in the format of timecode, speaker, caption.
Use speaker A, speaker B, etc. to identify speakers.
"""

audio_file_uri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"
audio_file = Part.from_uri(audio_file_uri, mime_type="audio/mpeg")

contents = [audio_file, prompt]

response = model.generate_content(contents, generation_config=GenerationConfig(audio_timestamp=True))

print(response.text)
# Example response:
# [00:00:00] Speaker A: Your devices are getting better over time...
# [00:00:16] Speaker B: Welcome to the Made by Google podcast, ...
# [00:01:00] Speaker A: So many features. I am a singer. ...
# [00:01:33] Speaker B: Amazing. DeCarlos, same question to you, ...

Java

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、generateContentStream メソッドを使用します。

  public ResponseStream<GenerateContentResponse> generateContentStream(Content content)

非ストリーミングレスポンスの場合は、generateContent メソッドを使用します。

  public GenerateContentResponse generateContent(Content content)

サンプルコード

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.ContentMaker;
import com.google.cloud.vertexai.generativeai.GenerativeModel;
import com.google.cloud.vertexai.generativeai.PartMaker;
import com.google.cloud.vertexai.generativeai.ResponseHandler;
import java.io.IOException;

public class AudioInputTranscription {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-google-cloud-project-id";
    String location = "us-central1";
    String modelName = "gemini-1.5-flash-001";

    transcribeAudio(projectId, location, modelName);
  }

  // Analyzes the given audio input.
  public static String transcribeAudio(String projectId, String location, String modelName)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs
    // to be created once, and can be reused for multiple requests.
    try (VertexAI vertexAI = new VertexAI(projectId, location)) {
      String audioUri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3";

      GenerativeModel model = new GenerativeModel(modelName, vertexAI);
      GenerateContentResponse response = model.generateContent(
          ContentMaker.fromMultiModalData(
              "Can you transcribe this interview, in the format of timecode, speaker, caption.\n"
                  + "Use speaker A, speaker B, etc. to identify speakers.",
              PartMaker.fromMimeTypeAndData("audio/mp3", audioUri)
          ));

      String output = ResponseHandler.getText(response);
      System.out.println(output);

      return output;
    }
  }
}

Node.js

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、generateContentStream メソッドを使用します。

  const streamingResp = await generativeModel.generateContentStream(request);

非ストリーミングレスポンスの場合は、generateContent メソッドを使用します。

  const streamingResp = await generativeModel.generateContent(request);

サンプルコード

const {VertexAI} = require('@google-cloud/vertexai');

/**
 * TODO(developer): Update these variables before running the sample.
 */
async function transcript_audio(projectId = 'PROJECT_ID') {
  const vertexAI = new VertexAI({project: projectId, location: 'us-central1'});

  const generativeModel = vertexAI.getGenerativeModel({
    model: 'gemini-1.5-flash-001',
  });

  const filePart = {
    file_data: {
      file_uri: 'gs://cloud-samples-data/generative-ai/audio/pixel.mp3',
      mime_type: 'audio/mpeg',
    },
  };
  const textPart = {
    text: `
    Can you transcribe this interview, in the format of timecode, speaker, caption?
    Use speaker A, speaker B, etc. to identify speakers.`,
  };

  const request = {
    contents: [{role: 'user', parts: [filePart, textPart]}],
  };

  const resp = await generativeModel.generateContent(request);
  const contentResponse = await resp.response;
  console.log(JSON.stringify(contentResponse));
}

Go

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、GenerateContentStream メソッドを使用します。

  iter := model.GenerateContentStream(ctx, genai.Text("Tell me a story about a lumberjack and his giant ox. Keep it very short."))

非ストリーミングレスポンスの場合は、GenerateContent メソッドを使用します。

  resp, err := model.GenerateContent(ctx, genai.Text("What is the average size of a swallow?"))

サンプルコード

import (
	"context"
	"errors"
	"fmt"
	"io"
	"mime"
	"path/filepath"

	"cloud.google.com/go/vertexai/genai"
)

// transcribeAudio generates a response into w
func transcribeAudio(w io.Writer, projectID, location, modelName string) error {
	// location := "us-central1"
	// modelName := "gemini-1.5-flash-001"

	ctx := context.Background()

	client, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		return fmt.Errorf("unable to create client: %w", err)
	}
	defer client.Close()

	model := client.GenerativeModel(modelName)

	// Optional: set an explicit temperature
	model.SetTemperature(0.4)

	// Given an audio file URL, prepare audio file as genai.Part
	img := genai.FileData{
		MIMEType: mime.TypeByExtension(filepath.Ext("pixel.mp3")),
		FileURI:  "gs://cloud-samples-data/generative-ai/audio/pixel.mp3",
	}

	res, err := model.GenerateContent(ctx, img, genai.Text(`
			Can you transcribe this interview, in the format of timecode, speaker, caption.
			Use speaker A, speaker B, etc. to identify speakers.
	`))
	if err != nil {
		return fmt.Errorf("unable to generate contents: %w", err)
	}

	if len(res.Candidates) == 0 ||
		len(res.Candidates[0].Content.Parts) == 0 {
		return errors.New("empty response from model")
	}

	fmt.Fprintf(w, "generated transcript:\n%s\n", res.Candidates[0].Content.Parts[0])
	return nil
}

C#

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスの場合は、StreamGenerateContent メソッドを使用します。

  public virtual PredictionServiceClient.StreamGenerateContentStream StreamGenerateContent(GenerateContentRequest request)

非ストリーミングレスポンスの場合は、GenerateContentAsync メソッドを使用します。

  public virtual Task<GenerateContentResponse> GenerateContentAsync(GenerateContentRequest request)

サーバーがレスポンスをストリーミングする方法の詳細については、ストリーミング RPC をご覧ください。

サンプルコード


using Google.Cloud.AIPlatform.V1;
using System;
using System.Threading.Tasks;

public class AudioInputTranscription
{
    public async Task<string> TranscribeAudio(
        string projectId = "your-project-id",
        string location = "us-central1",
        string publisher = "google",
        string model = "gemini-1.5-flash-001")
    {

        var predictionServiceClient = new PredictionServiceClientBuilder
        {
            Endpoint = $"{location}-aiplatform.googleapis.com"
        }.Build();

        string prompt = @"Can you transcribe this interview, in the format of timecode, speaker, caption.
Use speaker A, speaker B, etc. to identify speakers.";

        var generateContentRequest = new GenerateContentRequest
        {
            Model = $"projects/{projectId}/locations/{location}/publishers/{publisher}/models/{model}",
            Contents =
            {
                new Content
                {
                    Role = "USER",
                    Parts =
                    {
                        new Part { Text = prompt },
                        new Part { FileData = new() { MimeType = "audio/mp3", FileUri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }
                    }
                }
            }
        };

        GenerateContentResponse response = await predictionServiceClient.GenerateContentAsync(generateContentRequest);

        string responseText = response.Candidates[0].Content.Parts[0].Text;
        Console.WriteLine(responseText);

        return responseText;
    }
}

REST

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION: リクエストを処理するリージョン。サポートされているリージョンを入力します。サポートされているリージョンの一覧については、利用可能なロケーションをご覧ください。
クリックして、利用可能なリージョンの一部を開く
- us-central1
- us-west4
- northamerica-northeast1
- us-east4
- us-west1
- asia-northeast3
- asia-southeast1
- asia-northeast1
PROJECT_ID: 実際のプロジェクト ID。
FILE_URI: プロンプトに含めるファイルの URI または URL。指定できる値は以下のとおりです。
- Cloud Storage バケット URI: オブジェクトは一般公開されているか、リクエストを送信するプロジェクトと同じ Google Cloud プロジェクトに存在している必要があります。gemini-1.5-pro と gemini-1.5-flash の場合、サイズの上限は 2 GB です。gemini-1.0-pro-vision の場合、サイズの上限は 20 MB です。
- HTTP URL: ファイルの URL は一般公開されている必要があります。リクエストごとに 1 つの動画ファイル、1 つの音声ファイル、最大 10 個の画像ファイルを指定できます。音声ファイル、動画ファイル、ドキュメントのサイズは 15 MB 以下にする必要があります。
- YouTube 動画の URL: YouTube 動画は、Google Cloud コンソールのログインに使用したアカウントが所有しているか、公開されている必要があります。リクエストごとにサポートされる YouTube 動画の URL は 1 つだけです。
fileURI を指定する場合は、ファイルのメディアタイプ（mimeType）も指定する必要があります。VPC Service Controls が有効になっている場合、fileURI のメディアファイル URL の指定はサポートされていません。

Cloud Storage に音声ファイルがない場合は、MIME タイプが audio/mp3 の一般公開ファイル gs://cloud-samples-data/generative-ai/audio/pixel.mp3 を使用できます。この音声を聴くには、サンプル MP3 ファイルを開きます。
MIME_TYPE: data フィールドまたは fileUri フィールドで指定されたファイルのメディアタイプ。指定できる値は次のとおりです。
クリックして MIME タイプを開く
- application/pdf
- audio/mpeg
- audio/mp3
- audio/wav
- image/png
- image/jpeg
- image/webp
- text/plain
- video/mov
- video/mpeg
- video/mp4
- video/mpg
- video/avi
- video/wmv
- video/mpegps
- video/flv
```
TEXT
```
プロンプトに含める指示のテキスト。例: Can you transcribe this interview, in the format of timecode, speaker, caption. Use speaker A, speaker B, etc. to identify speakers.。

リクエストを送信するには、次のいずれかのオプションを選択します。

curl

cat > request.json << 'EOF'
{
  "contents": {
    "role": "USER",
    "parts": [
      {
        "fileData": {
          "fileUri": "FILE_URI",
          "mimeType": "MIME_TYPE"
        }
      },
      {
        "text": "TEXT"
      }
    ]
  },
  "generatationConfig": {
    "audioTimestamp": true
  }
}
EOF

その後、次のコマンドを実行して REST リクエストを送信します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-flash:generateContent"

PowerShell

@'
{
  "contents": {
    "role": "USER",
    "parts": [
      {
        "fileData": {
          "fileUri": "FILE_URI",
          "mimeType": "MIME_TYPE"
        }
      },
      {
        "text": "TEXT"
      }
    ]
  },
  "generatationConfig": {
    "audioTimestamp": true
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

その後、次のコマンドを実行して REST リクエストを送信します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/gemini-1.5-flash:generateContent" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "0:00 Speaker A: Your devices are getting better over time, and so we think
              about it across the entire portfolio from phones to watch to buds to tablet. We get
              really excited about how we can tell a joint narrative across everything.
              0:18 Speaker B: Welcome to the Made By Google Podcast, where we meet the people who
              work on the Google products you love. Here's your host, Rasheed.
              0:33 Speaker B: Today we're talking to Aisha and DeCarlos. They're both
              Product Managers for various Pixel devices and work on something that all the Pixel
              owners love. The Pixel feature drops. This is the Made By Google Podcast. Aisha, which
              feature on your Pixel phone has been most transformative in your own life?
              0:56 Speaker A: So many features. I am a singer, so I actually think recorder
              transcription has been incredible because before I would record songs I'd just like,
              freestyle them, record them, type them up. But now with transcription it works so well
              even deciphering lyrics that are jumbled. I think that's huge.
              ...
              Subscribe now wherever you get your podcasts to be the first to listen."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.043609526,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.06255973
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.022328783,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.04426588
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.07107367,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.049405243
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probabilityScore": 0.10484337,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severityScore": 0.13128456
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 18871,
    "candidatesTokenCount": 2921,
    "totalTokenCount": 21792
  }
}

このサンプルの URL にある次の点に注意してください。

generateContent メソッドを使用して、レスポンスが完全に生成された後に返されるようにリクエストします。ユーザーが認識するレイテンシを短縮するには、streamGenerateContent メソッドを使用して、生成時にレスポンスをストリーミングします。
マルチモーダルモデル ID は、URL の末尾のメソッドの前に配置されます（例: gemini-1.5-flash、gemini-1.0-pro-vision）。このサンプルでは、他のモデルもサポートされている場合があります。

コンソール

Google Cloud コンソールでマルチモーダルプロンプトを送信する手順は次のとおりです。

Google Cloud コンソールの [Vertex AI] セクションで、[Vertex AI Studio] ページに移動します。

Vertex AI Studio に移動
[自由形式を開く] をクリックします。
省略可: モデルとパラメータを構成します。
- Model: モデルを選択します。
- リージョン: 使用するリージョンを選択します。
- 温度: スライダーまたはテキストボックスを使用して、温度の値を入力します。
  
  温度は、レスポンス生成時のサンプリングに使用されます。レスポンス生成は、topP と topK が適用された場合に発生します。温度は、トークン選択のランダム性の度合いを制御します。温度が低いほど、確定的で自由度や創造性を抑えたレスポンスが求められるプロンプトに適しています。一方、温度が高いと、より多様で創造的な結果を導くことができます。温度が 0 の場合、確率が最も高いトークンが常に選択されます。この場合、特定のプロンプトに対するレスポンスはほとんど確定的ですが、わずかに変動する可能性は残ります。
  モデルが返すレスポンスが一般的すぎる、短すぎる、あるいはフォールバック（代替）レスポンスが返ってくる場合は、温度を高く設定してみてください。
- 出力トークンの上限: スライダーまたはテキストボックスを使用して、最大出力の上限値を入力します。
  
  レスポンスで生成できるトークンの最大数。1 トークンは約 4 文字です。100 トークンは約 60～80 語に相当します。
  レスポンスを短くしたい場合は小さい値を、長くしたい場合は大きい値を指定します。
- 停止シーケンスを追加: 省略可。停止シーケンスを入力します。これはスペースを含む一連の文字列です。モデルが停止シーケンスに遭遇すると、レスポンスの生成が停止します。停止シーケンスはレスポンスには含まれません。停止シーケンスは 5 つまで追加できます。
省略可: 詳細パラメータを構成するには、[詳細] をクリックして、次のように構成します。
クリックして [高度な構成] を開く
- Top-K: スライダーまたはテキストボックスを使用して、Top-K の値を入力します（Gemini 1.5 ではサポートされていません）。
  Top-K は、モデルが出力用にトークンを選択する方法を変更します。Top-K が 1 の場合、次に選択されるトークンは、モデルの語彙内のすべてのトークンで最も確率の高いものであることになります（グリーディデコードとも呼ばれます）。Top-K が 3 の場合は、最も確率が高い上位 3 つのトークンから次のトークン選択されることになります（温度を使用します）。
  トークン選択のそれぞれのステップで、最も高い確率を持つ Top-K のトークンがサンプリングされます。その後、トークンはトップ P に基づいてさらにフィルタリングされ、最終的なトークンは温度サンプリングを用いて選択されます。
  
  ランダムなレスポンスを減らしたい場合は小さい値を、ランダムなレスポンスを増やしたい場合は大きい値を指定します。
- トップ P: スライダーまたはテキストボックスを使用して、トップ P の値を入力します。確率の合計が Top-P の値と等しくなるまで、最も確率が高いものから最も確率が低いものの順に、トークンが選択されます。結果を最小にするには、Top-P を 0 に設定します。
- 最大レスポンス数: スライダーまたはテキストボックスを使用して、生成するレスポンスの数の値を入力します。
- ストリーミングレスポンス: 有効にすると、レスポンスが生成されたときに出力されます。
- 安全フィルタのしきい値: 有害なおそれのあるレスポンスが表示される可能性のしきい値を選択します。
- グラウンディングを有効にする: マルチモーダルプロンプトでは、グラウンティングはサポートされていません。
[メディアを挿入] をクリックし、ファイルのソースを選択します。
アップロード
アップロードするファイルを選択して [開く] をクリックします。

URL
使用するファイルの URL を入力し、[挿入] をクリックします。

Cloud Storage
バケットを選択してから、バケット内のインポートするファイルを選択し、[選択] をクリックします。
Google ドライブ
1. このオプションを初めて選択するときに、アカウントを選択して Vertex AI Studio がアカウントにアクセスできるように同意します。合計サイズが最大 10 MB の複数のファイルをアップロードできます。1 つのファイルのサイズが 7 MB を超えないようにしてください。
2. 追加するファイルをクリックします。
3. [選択] をクリックします。
  
  ファイルのサムネイルが [プロンプト] ペインに表示されます。トークンの合計数も表示されます。プロンプトデータがトークンの上限を超えると、トークンは切り捨てられ、データの処理には含まれません。
[プロンプト] ペインにテキストプロンプトを入力します。
省略可: [テキストのトークン ID] と [トークン ID] を表示するには、[プロンプト] ペインで [トークン数] をクリックします。
注: メディアトークンはサポートされていません。
[送信] をクリックします。
省略可: プロンプトを [マイプロンプト] に保存するには、[ 保存] をクリックします。
省略可: プロンプトの Python コードまたは curl コマンドを取得するには、[コードを取得] をクリックします。

オプションのモデルパラメータを設定する

各モデルには、設定可能な一連のオプションパラメータがあります。詳細については、コンテンツ生成パラメータをご覧ください。

音声の要件

Gemini マルチモーダルモデルは、次の音声 MIME タイプをサポートしています。

音声の MIME タイプ	Gemini 2.0 Flash	Gemini 1.5 Flash	Gemini 1.5 Pro
AAC - `audio/aac`
FLAC - `audio/flac`
MP3 - `audio/mp3`
MPA - `audio/m4a`
MPEG - `audio/mpeg`
MPGA - `audio/mpga`
MP4 - `audio/mp4`
OPUS - `vaudio/opus`
PCM - `audio/pcm`
WAV - `audio/wav`
WEBM - `audio/webm`

プロンプトリクエストには、最大で 1 つの音声ファイルを含めることができます。

制限事項

Gemini マルチモーダルモデルは多くのマルチモーダルユースケースに対応していますが、モデルの制限事項も理解しておく必要があります。

非音声の認識: 音声をサポートするモデルでは、音声以外の音を認識する際に誤りが発生する可能性があります。
音声のみのタイムスタンプ: 音声のみのファイルのタイムスタンプを正確に生成するには、generation_config で audio_timestamp パラメータを構成する必要があります。
音声文字変換の句読点:（Gemini 1.5 Flash を使用している場合）モデルから返される音声文字変換に句読点が含まれない場合があります。

次のステップ

Gemini マルチモーダルモデルで構築を開始する - 新規のお客様は $300 分の無料クレジット Google Cloud を受け取ることができ、Gemini の機能を試すことが可能です。
チャットプロンプトリクエストの送信方法を学習する。
責任ある AI のベストプラクティスと Vertex AI の安全フィルタについて学習する。

音声理解（音声のみ）

サポートされているモデル

リクエストに音声を追加する

単一の音声

Gen AI SDK for Python

Vertex AI SDK for Python

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Java

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Node.js

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Go

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

C#

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

REST

curl

PowerShell

レスポンス

コンソール

クリックして [高度な構成] を開く

アップロード

URL

Cloud Storage

Google ドライブ

音声文字起こし

Gen AI SDK for Python

Vertex AI SDK for Python

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Java

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Node.js

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

Go

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

C#

ストリーミング レスポンスと非ストリーミング レスポンス

サンプルコード

REST

curl

PowerShell

レスポンス

コンソール

クリックして [高度な構成] を開く

アップロード

URL

Cloud Storage

Google ドライブ

オプションのモデル パラメータを設定する

音声の要件

制限事項

次のステップ

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

ストリーミングレスポンスと非ストリーミングレスポンス

オプションのモデルパラメータを設定する