Diese Seite wurde von der Cloud Translation API übersetzt.

Transkription von Audiotracks abrufen

Die Video Intelligence API transkribiert Sprache in Text aus unterstützten Videodateien. Es gibt zwei unterstützte Modelle: "Standard" und "Video".

Sprachtranskription für ein Video anfordern

REST

Prozessanfrage senden

Im Folgenden wird gezeigt, wie eine POST-Anfrage an die Methode videos:annotate gesendet wird. Im Beispiel wird das Zugriffstoken für ein Dienstkonto verwendet, das mit der Google Cloud CLI für das Projekt eingerichtet wurde. Anleitungen zur Installation des Google Cloud CLI, zur Einrichtung eines Projekts mit einem Dienstkonto und zur Anforderung eines Zugriffstokens finden Sie in der Kurzanleitung zu Video Intelligence.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

INPUT_URI: Ein Cloud Storage-Bucket, der die Datei enthält, die Sie annotieren möchten, einschließlich des Dateinamens. Muss mit gs:// beginnen.
Beispiel: "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
LANGUAGE_CODE: [Optional] Siehe unterstützte Sprachen
PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud -Projekt

HTTP-Methode und URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

JSON-Text anfordern:


{
"inputUri": "INPUT_URI",
  "features": ["SPEECH_TRANSCRIPTION"],
  "videoContext": {
    "speechTranscriptionConfig": {
      "languageCode": "LANGUAGE_CODE",
      "enableAutomaticPunctuation": true,
      "filterProfanity": true
    }
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto bei der gcloud-Befehlszeile angemeldet haben. Dazu haben Sie gcloud init oder gcloud auth login ausgeführt oder die Cloud Shell genutzt, die Sie automatisch bei der gcloud-Befehlszeile anmeldet. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto bei der gcloud-Befehlszeile angemeldet haben. Dazu führen Sie gcloud init oder gcloud auth login aus. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Wenn die Anfrage erfolgreich ist, gibt Video Intelligence das name für Ihren Vorgang zurück. Das Beispiel oben zeigt eine solche Antwort, wobei project-number die Nummer Ihres Projekts und operation-id die ID des lang andauernden Vorgangs ist, der für die Anfrage erstellt wurde.

Ergebnisse abrufen

Senden Sie eine GET mit dem vom Aufruf an videos:annotate zurückgegebenen Vorgangsnamen, wie im folgenden Beispiel gezeigt, um die Ergebnisse Ihrer Anfrage zu erhalten.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

OPERATION_NAME: Der von der Video Intelligence API zurückgegebene Name des Vorgangs. Der Vorgangsname hat das Format projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.

PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud -Projekt

HTTP-Methode und URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Führen Sie folgenden Befehl aus:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Führen Sie folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

Antwort

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [{
      "inputUri": "/bucket-name-123/sample-video-short.mp4",
      "progressPercent": 100,
      "startTime": "2018-04-09T15:19:38.919779Z",
      "updateTime": "2018-04-09T15:21:17.652470Z"
    }]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
    {
          "speechTranscriptions": [
      {
            "alternatives": [
          {
            "transcript": "and laughing going to talk about is the video intelligence API how many of you saw it at the keynote yesterday ",
            "confidence": 0.8442509,
            "words": [
          {
            "startTime": "0.200s",
            "endTime": "0.800s",
            "word": "and"
          },
          {
            "startTime": "0.800s",
            "endTime": "1.100s",
            "word": "laughing"
          },
          {
            "startTime": "1.100s",
            "endTime": "1.200s",
            "word": "going"
          },
    ...

Annotationsergebnisse herunterladen

Kopieren Sie die Annotation aus der Quelle in den Ziel-Bucket (siehe Dateien und Objekte kopieren)

gcloud storage cp gcs_uri gs://my-bucket

Hinweis: Wenn der Nutzer den Ausgabe-gcs-URI vom Nutzer bereitstellt, wird die Annotation in diesem gcs-uri gespeichert.

Go

Richten Sie zur Authentifizierung bei Video Intelligence die Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.


func speechTranscriptionURI(w io.Writer, file string) error {
	ctx := context.Background()
	client, err := video.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		Features: []videopb.Feature{
			videopb.Feature_SPEECH_TRANSCRIPTION,
		},
		VideoContext: &videopb.VideoContext{
			SpeechTranscriptionConfig: &videopb.SpeechTranscriptionConfig{
				LanguageCode:               "en-US",
				EnableAutomaticPunctuation: true,
			},
		},
		InputUri: file,
	})
	if err != nil {
		return err
	}
	resp, err := op.Wait(ctx)
	if err != nil {
		return err
	}

	// A single video was processed. Get the first result.
	result := resp.AnnotationResults[0]

	for _, transcription := range result.SpeechTranscriptions {
		// The number of alternatives for each transcription is limited by
		// SpeechTranscriptionConfig.MaxAlternatives.
		// Each alternative is a different possible transcription
		// and has its own confidence score.
		for _, alternative := range transcription.GetAlternatives() {
			fmt.Fprintf(w, "Alternative level information:\n")
			fmt.Fprintf(w, "\tTranscript: %v\n", alternative.GetTranscript())
			fmt.Fprintf(w, "\tConfidence: %v\n", alternative.GetConfidence())

			fmt.Fprintf(w, "Word level information:\n")
			for _, wordInfo := range alternative.GetWords() {
				startTime := wordInfo.GetStartTime()
				endTime := wordInfo.GetEndTime()
				fmt.Fprintf(w, "\t%4.1f - %4.1f: %v (speaker %v)\n",
					float64(startTime.GetSeconds())+float64(startTime.GetNanos())*1e-9, // start as seconds
					float64(endTime.GetSeconds())+float64(endTime.GetNanos())*1e-9,     // end as seconds
					wordInfo.GetWord(),
					wordInfo.GetSpeakerTag())
			}
		}
	}

	return nil
}

Java

// Instantiate a com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClient
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
  // Set the language code
  SpeechTranscriptionConfig config =
      SpeechTranscriptionConfig.newBuilder()
          .setLanguageCode("en-US")
          .setEnableAutomaticPunctuation(true)
          .build();

  // Set the video context with the above configuration
  VideoContext context = VideoContext.newBuilder().setSpeechTranscriptionConfig(config).build();

  // Create the request
  AnnotateVideoRequest request =
      AnnotateVideoRequest.newBuilder()
          .setInputUri(gcsUri)
          .addFeatures(Feature.SPEECH_TRANSCRIPTION)
          .setVideoContext(context)
          .build();

  // asynchronously perform speech transcription on videos
  OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> response =
      client.annotateVideoAsync(request);

  System.out.println("Waiting for operation to complete...");
  // Display the results
  for (VideoAnnotationResults results :
      response.get(600, TimeUnit.SECONDS).getAnnotationResultsList()) {
    for (SpeechTranscription speechTranscription : results.getSpeechTranscriptionsList()) {
      try {
        // Print the transcription
        if (speechTranscription.getAlternativesCount() > 0) {
          SpeechRecognitionAlternative alternative = speechTranscription.getAlternatives(0);

          System.out.printf("Transcript: %s\n", alternative.getTranscript());
          System.out.printf("Confidence: %.2f\n", alternative.getConfidence());

          System.out.println("Word level information:");
          for (WordInfo wordInfo : alternative.getWordsList()) {
            double startTime =
                wordInfo.getStartTime().getSeconds() + wordInfo.getStartTime().getNanos() / 1e9;
            double endTime =
                wordInfo.getEndTime().getSeconds() + wordInfo.getEndTime().getNanos() / 1e9;
            System.out.printf(
                "\t%4.2fs - %4.2fs: %s\n", startTime, endTime, wordInfo.getWord());
          }
        } else {
          System.out.println("No transcription found");
        }
      } catch (IndexOutOfBoundsException ioe) {
        System.out.println("Could not retrieve frame: " + ioe.getMessage());
      }
    }
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const videoIntelligence = require('@google-cloud/video-intelligence');

// Creates a client
const client = new videoIntelligence.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of video to analyze, e.g. gs://my-bucket/my-video.mp4';

async function analyzeVideoTranscript() {
  const videoContext = {
    speechTranscriptionConfig: {
      languageCode: 'en-US',
      enableAutomaticPunctuation: true,
    },
  };

  const request = {
    inputUri: gcsUri,
    features: ['SPEECH_TRANSCRIPTION'],
    videoContext: videoContext,
  };

  const [operation] = await client.annotateVideo(request);
  console.log('Waiting for operation to complete...');
  const [operationResult] = await operation.promise();
  // There is only one annotation_result since only
  // one video is processed.
  const annotationResults = operationResult.annotationResults[0];

  for (const speechTranscription of annotationResults.speechTranscriptions) {
    // The number of alternatives for each transcription is limited by
    // SpeechTranscriptionConfig.max_alternatives.
    // Each alternative is a different possible transcription
    // and has its own confidence score.
    for (const alternative of speechTranscription.alternatives) {
      console.log('Alternative level information:');
      console.log(`Transcript: ${alternative.transcript}`);
      console.log(`Confidence: ${alternative.confidence}`);

      console.log('Word level information:');
      for (const wordInfo of alternative.words) {
        const word = wordInfo.word;
        const start_time =
          wordInfo.startTime.seconds + wordInfo.startTime.nanos * 1e-9;
        const end_time =
          wordInfo.endTime.seconds + wordInfo.endTime.nanos * 1e-9;
        console.log('\t' + start_time + 's - ' + end_time + 's: ' + word);
      }
    }
  }
}

analyzeVideoTranscript();

Python

"""Transcribe speech from a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]

config = videointelligence.SpeechTranscriptionConfig(
    language_code="en-US", enable_automatic_punctuation=True
)
video_context = videointelligence.VideoContext(speech_transcription_config=config)

operation = video_client.annotate_video(
    request={
        "features": features,
        "input_uri": path,
        "video_context": video_context,
    }
)

print("\nProcessing video for speech transcription.")

result = operation.result(timeout=600)

# There is only one annotation_result since only
# one video is processed.
annotation_results = result.annotation_results[0]
for speech_transcription in annotation_results.speech_transcriptions:
    # The number of alternatives for each transcription is limited by
    # SpeechTranscriptionConfig.max_alternatives.
    # Each alternative is a different possible transcription
    # and has its own confidence score.
    for alternative in speech_transcription.alternatives:
        print("Alternative level information:")

        print("Transcript: {}".format(alternative.transcript))
        print("Confidence: {}\n".format(alternative.confidence))

        print("Word level information:")
        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            print(
                "\t{}s - {}s: {}".format(
                    start_time.seconds + start_time.microseconds * 1e-6,
                    end_time.seconds + end_time.microseconds * 1e-6,
                    word,
                )
            )

Weitere Sprachen

C#: Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für Ruby auf.

Transkription von Audiotracks abrufen Mit Sammlungen den Überblick behalten Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.

Sprachtranskription für ein Video anfordern

REST

Prozessanfrage senden

curl (Linux, macOS oder Cloud Shell)

PowerShell (Windows)

Ergebnisse abrufen

curl (Linux, macOS oder Cloud Shell)

PowerShell (Windows)

Antwort

Annotationsergebnisse herunterladen

Go

Java

Node.js

Python

Weitere Sprachen

Transkription von Audiotracks abrufen