[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e is best suited for transcribing audio clips of 10 minutes or less and can also perform natural language processing (NLP) tasks, offering a lower cost when using the \u003ccode\u003egemini-1.5-flash\u003c/code\u003e model.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.TRANSCRIBE\u003c/code\u003e is preferred for transcribing audio clips longer than 10 minutes and supports a wider array of languages compared to \u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e supports supervised tuning for certain models, whereas \u003ccode\u003eML.TRANSCRIBE\u003c/code\u003e does not offer this capability.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e has token limits for input and output, while \u003ccode\u003eML.TRANSCRIBE\u003c/code\u003e has no token limit but is limited to 480 minutes per individual audio clip.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.TRANSCRIBE\u003c/code\u003e has a much higher query per minute limit than the \u003ccode\u003egemini-1.5-pro\u003c/code\u003e model in the \u003ccode\u003eML.GENERATE_TEXT\u003c/code\u003e function, whereas the \u003ccode\u003egemini-1.5-flash\u003c/code\u003e model is higher.\u003c/p\u003e\n"]]],[],null,["# Choose a transcription function\n===============================\n\nThis document provides a comparison of the transcription functions\navailable in BigQuery ML, which are\n[`ML.GENERATE_TEXT`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-text)\nand\n[`ML.TRANSCRIBE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-process-document).\n\nYou can use the information in this document to help you decide which function\nto use in cases where the functions have overlapping capabilities.\n\nAt a high level, the difference between these functions is as follows:\n\n- `ML.GENERATE_TEXT` is a good choice for transcription of audio clips that are\n 10 minutes or shorter, and you can also use it to perform natural language\n processing (NLP) tasks. Audio transcription with `ML.GENERATE_TEXT` is less\n expensive than with `ML.TRANSCRIBE` when you use the `gemini-1.5-flash` model.\n\n- `ML.TRANSCRIBE` is a good choice for performing transcription on audio\n clips that are longer than 10 minutes. It also supports a wider range of\n languages than `ML.GENERATE_TEXT`.\n\nSupported models\n----------------\n\nSupported models are as follows:\n\n- `ML.GENERATE_TEXT`: you can use a subset of the Vertex AI [Gemini](/vertex-ai/generative-ai/docs/learn/models#gemini-models) models to generate text. For more information on supported models, see the [`ML.GENERATE_TEXT` syntax](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-text#syntax).\n- `ML.TRANSCRIBE`: you use the default model of the [Speech-to-Text API](/speech-to-text). Using the Document AI API gives you access to transcription with the [Chirp speech model](/speech-to-text/v2/docs/chirp-model).\n\nSupported tasks\n---------------\n\nSupported tasks are as follows:\n\n- `ML.GENERATE_TEXT`: you can perform audio transcription and natural language processing (NLP) tasks.\n- `ML.TRANSCRIBE`: you can perform audio transcription.\n\nPricing\n-------\n\nPricing is as follows:\n\n- `ML.GENERATE_TEXT`: for pricing of the Vertex AI models that you use with this function, see [Vertex AI pricing](/vertex-ai/generative-ai/pricing). Supervised tuning of supported models is charged at dollars per node hour. For more information, see [Vertex AI custom training pricing](/vertex-ai/pricing#custom-trained_models).\n- `ML.TRANSCRIBE`: For pricing of the Cloud AI service that you use with this function, see [Speech-to-Text API pricing](/speech-to-text/pricing).\n\nSupervised tuning\n-----------------\n\nSupervised tuning support is as follows:\n\n- `ML.GENERATE_TEXT`: [supervised tuning](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model#supervised_tuning) is supported for some models.\n- `ML.TRANSCRIBE`: supervised tuning isn't supported.\n\nQueries per minute (QPM) limit\n------------------------------\n\nQPM limits are as follows:\n\n- `ML.GENERATE_TEXT`: 60 QPM in the default `us-central1` region for `gemini-1.5-pro` models, and 200 QPM in the default `us-central1` region for `gemini-1.5-flash` models. For more information, see [Generative AI on Vertex AI quotas](/vertex-ai/generative-ai/docs/quotas).\n- `ML.TRANSCRIBE`: 900 QPM per project. For more information, see [Quotas and limits](/speech-to-text/quotas).\n\nTo increase your quota, see\n[Request a quota adjustment](/docs/quotas/help/request_increase).\n\nToken limit\n-----------\n\nToken limits are as follows:\n\n- `ML.GENERATE_TEXT`: 700 input tokens, and 8196 output tokens. This output token limit means that `ML.GENERATE_TEXT` has a limit of approximately 39 minutes for an individual audio clip.\n- `ML.TRANSCRIBE`: No token limit. However, this function does have a limit of 480 minutes for an individual audio clip.\n\nSupported languages\n-------------------\n\nSupported languages are as follows:\n\n- `ML.GENERATE_TEXT`: supports the same languages as [Gemini](/vertex-ai/generative-ai/docs/learn/models#languages-gemini).\n- `ML.TRANSCRIBE`: supports all of the [Speech-to-Text supported languages](/speech-to-text/docs/speech-to-text-supported-languages).\n\nRegion availability\n-------------------\n\nRegion availability is as follows:\n\n- `ML.GENERATE_TEXT`: available in all Generative AI for Vertex AI [regions](/vertex-ai/generative-ai/docs/learn/locations#available-regions).\n- `ML.TRANSCRIBE`: available in the `EU` and `US` [multi-regions](/bigquery/docs/locations#multi-regions) for all speech recognizers."]]