Transcribe a local file with recognition metadata (beta)
Stay organized with collections
Save and categorize content based on your preferences.
Transcribe a local audio file, including recognition metadata in the response.
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[],null,["# Transcribe a local file with recognition metadata (beta)\n\nTranscribe a local audio file, including recognition metadata in the response.\n\nCode sample\n-----------\n\n### Java\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Java API\nreference documentation](/java/docs/reference/google-cloud-speech/latest/overview).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n /**\n * Transcribe the given audio file and include recognition metadata in the request.\n *\n * @param fileName the path to an audio file.\n */\n public static void transcribeFileWithMetadata(String fileName) throws Exception {\n Path path = Paths.get(fileName);\n byte[] content = Files.readAllBytes(path);\n\n try (SpeechClient speechClient = SpeechClient.create()) {\n // Get the contents of the local audio file\n RecognitionAudio recognitionAudio =\n RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();\n\n // Construct a recognition metadata object.\n // Most metadata fields are specified as enums that can be found\n // in speech.enums.RecognitionMetadata\n RecognitionMetadata metadata =\n RecognitionMetadata.newBuilder()\n .setInteractionType(InteractionType.DISCUSSION)\n .setMicrophoneDistance(MicrophoneDistance.NEARFIELD)\n .setRecordingDeviceType(RecordingDeviceType.SMARTPHONE)\n .setRecordingDeviceName(\"Pixel 2 XL\") // Some metadata fields are free form strings\n // And some are integers, for instance the 6 digit NAICS code\n // https://www.naics.com/search/\n .setIndustryNaicsCodeOfAudio(519190)\n .build();\n\n // Configure request to enable enhanced models\n RecognitionConfig config =\n RecognitionConfig.newBuilder()\n .setEncoding(AudioEncoding.LINEAR16)\n .setLanguageCode(\"en-US\")\n .setSampleRateHertz(8000)\n .setMetadata(metadata) // Add the metadata to the config\n .build();\n\n // Perform the transcription request\n RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);\n\n // Print out the results\n for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {\n // There can be several alternative transcripts for a given chunk of speech. Just use the\n // first (most likely) one here.\n SpeechRecognitionAlternative alternative = result.getAlternatives(0);\n System.out.format(\"Transcript: %s\\n\\n\", alternative.getTranscript());\n }\n }\n }\n\n### Node.js\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Node.js API\nreference documentation](/nodejs/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n // Imports the Google Cloud client library for Beta API\n /**\n * TODO(developer): Update client library import to use new\n * version of API when desired features become available\n */\n const speech = require('https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html').v1p1beta1;\n const fs = require('fs');\n\n // Creates a client\n const client = new speech.https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html();\n\n async function syncRecognizeWithMetaData() {\n /**\n * TODO(developer): Uncomment the following lines before running the sample.\n */\n // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';\n // const encoding = 'Encoding of the audio file, e.g. LINEAR16';\n // const sampleRateHertz = 16000;\n // const languageCode = 'BCP-47 language code, e.g. en-US';\n\n const recognitionMetadata = {\n interactionType: 'DISCUSSION',\n microphoneDistance: 'NEARFIELD',\n recordingDeviceType: 'SMARTPHONE',\n recordingDeviceName: 'Pixel 2 XL',\n industryNaicsCodeOfAudio: 519190,\n };\n\n const config = {\n encoding: encoding,\n sampleRateHertz: sampleRateHertz,\n languageCode: languageCode,\n metadata: recognitionMetadata,\n };\n\n const audio = {\n content: fs.readFileSync(filename).toString('base64'),\n };\n\n const request = {\n config: config,\n audio: audio,\n };\n\n // Detects speech in the audio file\n const [response] = await client.recognize(request);\n response.results.forEach(result =\u003e {\n const alternative = result.alternatives[0];\n console.log(alternative.transcript);\n });\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.cloud import speech_v1p1beta1 as speech\n\n client = speech.SpeechClient()\n\n speech_file = \"resources/commercial_mono.wav\"\n\n with open(speech_file, \"rb\") as audio_file:\n content = audio_file.read()\n\n # Here we construct a recognition metadata object.\n # Most metadata fields are specified as enums that can be found\n # in speech.enums.RecognitionMetadata\n metadata = speech.RecognitionMetadata()\n metadata.interaction_type = speech.RecognitionMetadata.InteractionType.DISCUSSION\n metadata.microphone_distance = (\n speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD\n )\n metadata.recording_device_type = (\n speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE\n )\n\n # Some metadata fields are free form strings\n metadata.recording_device_name = \"Pixel 2 XL\"\n # And some are integers, for instance the 6 digit NAICS code\n # https://www.naics.com/search/\n metadata.industry_naics_code_of_audio = 519190\n\n audio = speech.RecognitionAudio(content=content)\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=8000,\n language_code=\"en-US\",\n # Add this in the request to send metadata.\n metadata=metadata,\n )\n\n response = client.recognize(config=config, audio=audio)\n\n for i, result in enumerate(response.results):\n alternative = result.alternatives[0]\n print(\"-\" * 20)\n print(f\"First alternative of result {i}\")\n print(f\"Transcript: {alternative.transcript}\")\n\n return response.results\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]