Lokale Datei mit Erkennungsmetadaten transkribieren (Beta)
Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
Lokale Audiodatei einschließlich Erkennungsmetadaten in der Antwort transkribieren.
Codebeispiel
Nächste Schritte
Wenn Sie nach Codebeispielen für andere Google Cloud -Produkte suchen und filtern möchten, können Sie den Google Cloud -Beispielbrowser verwenden.
Sofern nicht anders angegeben, sind die Inhalte dieser Seite unter der Creative Commons Attribution 4.0 License und Codebeispiele unter der Apache 2.0 License lizenziert. Weitere Informationen finden Sie in den Websiterichtlinien von Google Developers. Java ist eine eingetragene Marke von Oracle und/oder seinen Partnern.
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],[],[],[],null,["# Transcribe a local file with recognition metadata (beta)\n\nTranscribe a local audio file, including recognition metadata in the response.\n\nCode sample\n-----------\n\n### Java\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Java API\nreference documentation](/java/docs/reference/google-cloud-speech/latest/overview).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n /**\n * Transcribe the given audio file and include recognition metadata in the request.\n *\n * @param fileName the path to an audio file.\n */\n public static void transcribeFileWithMetadata(String fileName) throws Exception {\n Path path = Paths.get(fileName);\n byte[] content = Files.readAllBytes(path);\n\n try (SpeechClient speechClient = SpeechClient.create()) {\n // Get the contents of the local audio file\n RecognitionAudio recognitionAudio =\n RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();\n\n // Construct a recognition metadata object.\n // Most metadata fields are specified as enums that can be found\n // in speech.enums.RecognitionMetadata\n RecognitionMetadata metadata =\n RecognitionMetadata.newBuilder()\n .setInteractionType(InteractionType.DISCUSSION)\n .setMicrophoneDistance(MicrophoneDistance.NEARFIELD)\n .setRecordingDeviceType(RecordingDeviceType.SMARTPHONE)\n .setRecordingDeviceName(\"Pixel 2 XL\") // Some metadata fields are free form strings\n // And some are integers, for instance the 6 digit NAICS code\n // https://www.naics.com/search/\n .setIndustryNaicsCodeOfAudio(519190)\n .build();\n\n // Configure request to enable enhanced models\n RecognitionConfig config =\n RecognitionConfig.newBuilder()\n .setEncoding(AudioEncoding.LINEAR16)\n .setLanguageCode(\"en-US\")\n .setSampleRateHertz(8000)\n .setMetadata(metadata) // Add the metadata to the config\n .build();\n\n // Perform the transcription request\n RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);\n\n // Print out the results\n for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {\n // There can be several alternative transcripts for a given chunk of speech. Just use the\n // first (most likely) one here.\n SpeechRecognitionAlternative alternative = result.getAlternatives(0);\n System.out.format(\"Transcript: %s\\n\\n\", alternative.getTranscript());\n }\n }\n }\n\n### Node.js\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Node.js API\nreference documentation](/nodejs/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n // Imports the Google Cloud client library for Beta API\n /**\n * TODO(developer): Update client library import to use new\n * version of API when desired features become available\n */\n const speech = require('https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html').v1p1beta1;\n const fs = require('fs');\n\n // Creates a client\n const client = new speech.https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html();\n\n async function syncRecognizeWithMetaData() {\n /**\n * TODO(developer): Uncomment the following lines before running the sample.\n */\n // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';\n // const encoding = 'Encoding of the audio file, e.g. LINEAR16';\n // const sampleRateHertz = 16000;\n // const languageCode = 'BCP-47 language code, e.g. en-US';\n\n const recognitionMetadata = {\n interactionType: 'DISCUSSION',\n microphoneDistance: 'NEARFIELD',\n recordingDeviceType: 'SMARTPHONE',\n recordingDeviceName: 'Pixel 2 XL',\n industryNaicsCodeOfAudio: 519190,\n };\n\n const config = {\n encoding: encoding,\n sampleRateHertz: sampleRateHertz,\n languageCode: languageCode,\n metadata: recognitionMetadata,\n };\n\n const audio = {\n content: fs.readFileSync(filename).toString('base64'),\n };\n\n const request = {\n config: config,\n audio: audio,\n };\n\n // Detects speech in the audio file\n const [response] = await client.recognize(request);\n response.results.forEach(result =\u003e {\n const alternative = result.alternatives[0];\n console.log(alternative.transcript);\n });\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.cloud import speech_v1p1beta1 as speech\n\n client = speech.SpeechClient()\n\n speech_file = \"resources/commercial_mono.wav\"\n\n with open(speech_file, \"rb\") as audio_file:\n content = audio_file.read()\n\n # Here we construct a recognition metadata object.\n # Most metadata fields are specified as enums that can be found\n # in speech.enums.RecognitionMetadata\n metadata = speech.RecognitionMetadata()\n metadata.interaction_type = speech.RecognitionMetadata.InteractionType.DISCUSSION\n metadata.microphone_distance = (\n speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD\n )\n metadata.recording_device_type = (\n speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE\n )\n\n # Some metadata fields are free form strings\n metadata.recording_device_name = \"Pixel 2 XL\"\n # And some are integers, for instance the 6 digit NAICS code\n # https://www.naics.com/search/\n metadata.industry_naics_code_of_audio = 519190\n\n audio = speech.RecognitionAudio(content=content)\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=8000,\n language_code=\"en-US\",\n # Add this in the request to send metadata.\n metadata=metadata,\n )\n\n response = client.recognize(config=config, audio=audio)\n\n for i, result in enumerate(response.results):\n alternative = result.alternatives[0]\n print(\"-\" * 20)\n print(f\"First alternative of result {i}\")\n print(f\"Transcript: {alternative.transcript}\")\n\n return response.results\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]