Transcrire un fichier local avec des métadonnées de reconnaissance (bêta)
Restez organisé à l'aide des collections
Enregistrez et classez les contenus selon vos préférences.
Transcrivez un fichier audio local, y compris les métadonnées de reconnaissance dans la réponse.
Exemple de code
Sauf indication contraire, le contenu de cette page est régi par une licence Creative Commons Attribution 4.0, et les échantillons de code sont régis par une licence Apache 2.0. Pour en savoir plus, consultez les Règles du site Google Developers. Java est une marque déposée d'Oracle et/ou de ses sociétés affiliées.
[[["Facile à comprendre","easyToUnderstand","thumb-up"],["J'ai pu résoudre mon problème","solvedMyProblem","thumb-up"],["Autre","otherUp","thumb-up"]],[["Difficile à comprendre","hardToUnderstand","thumb-down"],["Informations ou exemple de code incorrects","incorrectInformationOrSampleCode","thumb-down"],["Il n'y a pas l'information/les exemples dont j'ai besoin","missingTheInformationSamplesINeed","thumb-down"],["Problème de traduction","translationIssue","thumb-down"],["Autre","otherDown","thumb-down"]],[],[],[],null,["# Transcribe a local file with recognition metadata (beta)\n\nTranscribe a local audio file, including recognition metadata in the response.\n\nCode sample\n-----------\n\n### Java\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Java API\nreference documentation](/java/docs/reference/google-cloud-speech/latest/overview).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n /**\n * Transcribe the given audio file and include recognition metadata in the request.\n *\n * @param fileName the path to an audio file.\n */\n public static void transcribeFileWithMetadata(String fileName) throws Exception {\n Path path = Paths.get(fileName);\n byte[] content = Files.readAllBytes(path);\n\n try (SpeechClient speechClient = SpeechClient.create()) {\n // Get the contents of the local audio file\n RecognitionAudio recognitionAudio =\n RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();\n\n // Construct a recognition metadata object.\n // Most metadata fields are specified as enums that can be found\n // in speech.enums.RecognitionMetadata\n RecognitionMetadata metadata =\n RecognitionMetadata.newBuilder()\n .setInteractionType(InteractionType.DISCUSSION)\n .setMicrophoneDistance(MicrophoneDistance.NEARFIELD)\n .setRecordingDeviceType(RecordingDeviceType.SMARTPHONE)\n .setRecordingDeviceName(\"Pixel 2 XL\") // Some metadata fields are free form strings\n // And some are integers, for instance the 6 digit NAICS code\n // https://www.naics.com/search/\n .setIndustryNaicsCodeOfAudio(519190)\n .build();\n\n // Configure request to enable enhanced models\n RecognitionConfig config =\n RecognitionConfig.newBuilder()\n .setEncoding(AudioEncoding.LINEAR16)\n .setLanguageCode(\"en-US\")\n .setSampleRateHertz(8000)\n .setMetadata(metadata) // Add the metadata to the config\n .build();\n\n // Perform the transcription request\n RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);\n\n // Print out the results\n for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {\n // There can be several alternative transcripts for a given chunk of speech. Just use the\n // first (most likely) one here.\n SpeechRecognitionAlternative alternative = result.getAlternatives(0);\n System.out.format(\"Transcript: %s\\n\\n\", alternative.getTranscript());\n }\n }\n }\n\n### Node.js\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Node.js API\nreference documentation](/nodejs/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n // Imports the Google Cloud client library for Beta API\n /**\n * TODO(developer): Update client library import to use new\n * version of API when desired features become available\n */\n const speech = require('https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html').v1p1beta1;\n const fs = require('fs');\n\n // Creates a client\n const client = new speech.https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html();\n\n async function syncRecognizeWithMetaData() {\n /**\n * TODO(developer): Uncomment the following lines before running the sample.\n */\n // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';\n // const encoding = 'Encoding of the audio file, e.g. LINEAR16';\n // const sampleRateHertz = 16000;\n // const languageCode = 'BCP-47 language code, e.g. en-US';\n\n const recognitionMetadata = {\n interactionType: 'DISCUSSION',\n microphoneDistance: 'NEARFIELD',\n recordingDeviceType: 'SMARTPHONE',\n recordingDeviceName: 'Pixel 2 XL',\n industryNaicsCodeOfAudio: 519190,\n };\n\n const config = {\n encoding: encoding,\n sampleRateHertz: sampleRateHertz,\n languageCode: languageCode,\n metadata: recognitionMetadata,\n };\n\n const audio = {\n content: fs.readFileSync(filename).toString('base64'),\n };\n\n const request = {\n config: config,\n audio: audio,\n };\n\n // Detects speech in the audio file\n const [response] = await client.recognize(request);\n response.results.forEach(result =\u003e {\n const alternative = result.alternatives[0];\n console.log(alternative.transcript);\n });\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.cloud import speech_v1p1beta1 as speech\n\n client = speech.SpeechClient()\n\n speech_file = \"resources/commercial_mono.wav\"\n\n with open(speech_file, \"rb\") as audio_file:\n content = audio_file.read()\n\n # Here we construct a recognition metadata object.\n # Most metadata fields are specified as enums that can be found\n # in speech.enums.RecognitionMetadata\n metadata = speech.RecognitionMetadata()\n metadata.interaction_type = speech.RecognitionMetadata.InteractionType.DISCUSSION\n metadata.microphone_distance = (\n speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD\n )\n metadata.recording_device_type = (\n speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE\n )\n\n # Some metadata fields are free form strings\n metadata.recording_device_name = \"Pixel 2 XL\"\n # And some are integers, for instance the 6 digit NAICS code\n # https://www.naics.com/search/\n metadata.industry_naics_code_of_audio = 519190\n\n audio = speech.RecognitionAudio(content=content)\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=8000,\n language_code=\"en-US\",\n # Add this in the request to send metadata.\n metadata=metadata,\n )\n\n response = client.recognize(config=config, audio=audio)\n\n for i, result in enumerate(response.results):\n alternative = result.alternatives[0]\n print(\"-\" * 20)\n print(f\"First alternative of result {i}\")\n print(f\"Transcript: {alternative.transcript}\")\n\n return response.results\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]