Required. The natural language speech audio to be processed. A single request can contain up to 2 minutes of speech audio data. The transcribed text cannot contain more than 256 bytes for virtual agent interactions.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-06-27 UTC."],[[["\u003cp\u003eThe provided content outlines the JSON structure for representing natural language speech audio, which is to be processed.\u003c/p\u003e\n"],["\u003cp\u003eThe JSON object includes two required fields: "config," specifying how the speech should be processed, and "audio," containing the base64-encoded speech data.\u003c/p\u003e\n"],["\u003cp\u003eThe "config" field is an object of the type InputAudioConfig, and the "audio" field must be a string formatted as bytes.\u003c/p\u003e\n"],["\u003cp\u003eA single audio request can be up to 2 minutes long, with the resulting transcribed text being limited to a maximum of 256 bytes for virtual agent interactions.\u003c/p\u003e\n"]]],[],null,["# AudioInput\n\n- [JSON representation](#SCHEMA_REPRESENTATION)\n\nRepresents the natural language speech audio to be processed."]]