The natural language speech audio to be processed. A single
request can contain up to 1 minute of speech audio data. The
cannot contain more than 256 bytes.
For non-streaming audio detect intent, both ``config`` and
``audio`` must be provided. For streaming audio detect
intent, ``config`` must be provided in the first request and
``audio`` must be provided in all following requests.