Recognize speech by using medical models

Speech-to-Text offers two medical models in addition the other standard and enhanced speech recognition models. The medical models are specifically tailored for recognition of words that are common in medical settings, such as diagnoses, medications, symptoms, treatments, and conditions. If you want to recognize this type of audio data, you can improve your transcription results by using these models.

There are two medical models, each tailored to specific use cases:

  • medical_conversation: for conversations between a medical provider—for example, a doctor or nurse—and a patient. Use this model when both a provider and a patient are speaking. Words uttered by each speaker are automatically detected and labeled in the returned transcript.
  • medical_dictation: for dictated notes spoken by a single medical provider—for example, a doctor dictating notes about a patient's blood test results.

Use medical models only with the following Speech-to-Text features. Features omitted from this list can't be used with either medical model.

The medical conversation model supports the following features:

and requires that the following features be enabled:

The medical dictation model supports the following features:

and requires that the following features be enabled:

Send a transcription request

REST

The following code sample uses the medical_conversation model to transcribe an audio file in a public Cloud Storage bucket.

Before using any of the request data, make the following replacements:

  • LANGUAGE_CODE: the BCP-47 code of the language spoken in your audio clip. Medical models are only available for en-US.
  • ENCODING: the encoding of the audio you want to transcribe. If you are using the public audio sample, the encoding is LINEAR16.
  • PROJECT_ID: the alphanumeric ID of your Google Cloud project.

HTTP method and URL:

POST https://speech.googleapis.com/v1/speech:recognize

Request JSON body:

{
  "config": {
    "languageCode": "LANGUAGE_CODE",
    "encoding": "ENCODING",
    "model": "medical_conversation"
  },
  "audio": {
    "uri": "gs://cloud-samples-data/speech/medical_conversation_2.wav"
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

  "results": [
    {
      "alternatives": [
        {
          "transcript": "Um-hum . Yeah. Hello , good morning . Good
          morning . So , tell me what's going on . Uh , sure , so , um , I
          woke up probably three or four days ago , which , uh , wheezing and short of breath .
          Okay , any cough or chest pain ? I cough infrequently , but no ,
          uh , chest pain . Have you been exposed to anyone with covid ?
          Uh , no , and I also took a test , which was negative . Uh , is it getting
          worse , or better ? Uh , it has been getting a lot worse"
        }
      ]
    },
    {
      "alternatives": [
        {
          "transcript": "Okay . Was there something that triggered this exposure to cold , for
          example ? Um , I had a gone hiking , and I got caught in the rain the day
          before this all started ."
        }
      ]
    }
  ]
}

Spoken punctuation

The medical dictation model supports spoken punctuation for medical notes. This feature is always enabled. Spoken punctuation is delineated by brackets in the speech transcription. For example, your returned transcription might look similar to the following:

Patient could be showing signs of trauma [question mark] They said they were [quote] having elevated heart rate [unquote].

Speech-to-Text supports the following spoken punctuation:

  • period
  • comma
  • colon
  • caps
  • slash
  • dash
  • hyphen
  • question mark
  • semicolon
  • quote
  • unquote
  • end quote
  • open parenthesis
  • close parenthesis
  • end parenthesis

Formatting commands

The medical dictation model supports spoken commands for formatting notes. This feature is always enabled. The spoken commands will be delineated by brackets in the speech transcription. For example, your returned transcription might look similar to the following:

[next line] Patient says they are experiencing fever [next point].

Speech-to-Text supports the following spoken commands:

  • next point
  • next number
  • next paragraph
  • caps
  • capitalization
  • new line
  • next item
  • next problem
  • next problem number
  • next row
  • next section
  • number next
  • scratch
  • scratch that
  • end dictation

Spoken headings

The medical dictation model supports spoken headings for dictated notes. This feature is enabled by default, and cannot be disabled. The headings will be delineated by brackets in the transcription and will be capitalized. For example, your returned transcription might look similar to the following:

[CURRENT MEDICATIONS] Patient is currently taking no medications.

Speech-to-Text supports the following spoken headings:

  • CHIEF COMPLAINT
  • CURRENT MEDICATIONS
  • DISCHARGE MEDICATIONS
  • DISCHARGE PLAN
  • FAMILY HISTORY
  • FINDINGS
  • REVIEW OF SYSTEMS
  • HISTORY OF PRESENT ILLNESS
  • INDICATIONS
  • LABS
  • PAST SURGICAL HISTORY
  • PHYSICAL EXAM
  • REVIEW OF SYSTEMS
  • RADIOLOGY