인식 메타데이터를 사용하여 로컬 파일 텍스트 변환(베타)
컬렉션을 사용해 정리하기
내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.
응답의 인식 메타데이터를 포함하여 로컬 오디오 파일을 텍스트 변환합니다.
코드 샘플
달리 명시되지 않는 한 이 페이지의 콘텐츠에는 Creative Commons Attribution 4.0 라이선스에 따라 라이선스가 부여되며, 코드 샘플에는 Apache 2.0 라이선스에 따라 라이선스가 부여됩니다. 자세한 내용은 Google Developers 사이트 정책을 참조하세요. 자바는 Oracle 및/또는 Oracle 계열사의 등록 상표입니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],[],[],[],null,["# Transcribe a local file with recognition metadata (beta)\n\nTranscribe a local audio file, including recognition metadata in the response.\n\nCode sample\n-----------\n\n### Java\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Java API\nreference documentation](/java/docs/reference/google-cloud-speech/latest/overview).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n /**\n * Transcribe the given audio file and include recognition metadata in the request.\n *\n * @param fileName the path to an audio file.\n */\n public static void transcribeFileWithMetadata(String fileName) throws Exception {\n Path path = Paths.get(fileName);\n byte[] content = Files.readAllBytes(path);\n\n try (SpeechClient speechClient = SpeechClient.create()) {\n // Get the contents of the local audio file\n RecognitionAudio recognitionAudio =\n RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();\n\n // Construct a recognition metadata object.\n // Most metadata fields are specified as enums that can be found\n // in speech.enums.RecognitionMetadata\n RecognitionMetadata metadata =\n RecognitionMetadata.newBuilder()\n .setInteractionType(InteractionType.DISCUSSION)\n .setMicrophoneDistance(MicrophoneDistance.NEARFIELD)\n .setRecordingDeviceType(RecordingDeviceType.SMARTPHONE)\n .setRecordingDeviceName(\"Pixel 2 XL\") // Some metadata fields are free form strings\n // And some are integers, for instance the 6 digit NAICS code\n // https://www.naics.com/search/\n .setIndustryNaicsCodeOfAudio(519190)\n .build();\n\n // Configure request to enable enhanced models\n RecognitionConfig config =\n RecognitionConfig.newBuilder()\n .setEncoding(AudioEncoding.LINEAR16)\n .setLanguageCode(\"en-US\")\n .setSampleRateHertz(8000)\n .setMetadata(metadata) // Add the metadata to the config\n .build();\n\n // Perform the transcription request\n RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);\n\n // Print out the results\n for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {\n // There can be several alternative transcripts for a given chunk of speech. Just use the\n // first (most likely) one here.\n SpeechRecognitionAlternative alternative = result.getAlternatives(0);\n System.out.format(\"Transcript: %s\\n\\n\", alternative.getTranscript());\n }\n }\n }\n\n### Node.js\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Node.js API\nreference documentation](/nodejs/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n // Imports the Google Cloud client library for Beta API\n /**\n * TODO(developer): Update client library import to use new\n * version of API when desired features become available\n */\n const speech = require('https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html').v1p1beta1;\n const fs = require('fs');\n\n // Creates a client\n const client = new speech.https://cloud.google.com/nodejs/docs/reference/speech/latest/overview.html();\n\n async function syncRecognizeWithMetaData() {\n /**\n * TODO(developer): Uncomment the following lines before running the sample.\n */\n // const filename = 'Local path to audio file, e.g. /path/to/audio.raw';\n // const encoding = 'Encoding of the audio file, e.g. LINEAR16';\n // const sampleRateHertz = 16000;\n // const languageCode = 'BCP-47 language code, e.g. en-US';\n\n const recognitionMetadata = {\n interactionType: 'DISCUSSION',\n microphoneDistance: 'NEARFIELD',\n recordingDeviceType: 'SMARTPHONE',\n recordingDeviceName: 'Pixel 2 XL',\n industryNaicsCodeOfAudio: 519190,\n };\n\n const config = {\n encoding: encoding,\n sampleRateHertz: sampleRateHertz,\n languageCode: languageCode,\n metadata: recognitionMetadata,\n };\n\n const audio = {\n content: fs.readFileSync(filename).toString('base64'),\n };\n\n const request = {\n config: config,\n audio: audio,\n };\n\n // Detects speech in the audio file\n const [response] = await client.recognize(request);\n response.results.forEach(result =\u003e {\n const alternative = result.alternatives[0];\n console.log(alternative.transcript);\n });\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n from google.cloud import speech_v1p1beta1 as speech\n\n client = speech.SpeechClient()\n\n speech_file = \"resources/commercial_mono.wav\"\n\n with open(speech_file, \"rb\") as audio_file:\n content = audio_file.read()\n\n # Here we construct a recognition metadata object.\n # Most metadata fields are specified as enums that can be found\n # in speech.enums.RecognitionMetadata\n metadata = speech.RecognitionMetadata()\n metadata.interaction_type = speech.RecognitionMetadata.InteractionType.DISCUSSION\n metadata.microphone_distance = (\n speech.RecognitionMetadata.MicrophoneDistance.NEARFIELD\n )\n metadata.recording_device_type = (\n speech.RecognitionMetadata.RecordingDeviceType.SMARTPHONE\n )\n\n # Some metadata fields are free form strings\n metadata.recording_device_name = \"Pixel 2 XL\"\n # And some are integers, for instance the 6 digit NAICS code\n # https://www.naics.com/search/\n metadata.industry_naics_code_of_audio = 519190\n\n audio = speech.RecognitionAudio(content=content)\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=8000,\n language_code=\"en-US\",\n # Add this in the request to send metadata.\n metadata=metadata,\n )\n\n response = client.recognize(config=config, audio=audio)\n\n for i, result in enumerate(response.results):\n alternative = result.alternatives[0]\n print(\"-\" * 20)\n print(f\"First alternative of result {i}\")\n print(f\"Transcript: {alternative.transcript}\")\n\n return response.results\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]