RxNorm

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

You can start exploring this data:

Dataset content

This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

The following tables are included in the RxNorm dataset:

  • RXNCONSO contains concept and source information
  • RXNREL contains information regarding relationships between entities
  • RXNSAT contains attribute information
  • RXNSTY contains semantic information
  • RXNSAB contains source info
  • RXNCUI contains retired rxcui codes
  • RXNATOMARCHIVE contains archived data
  • RXNCUICHANGES contains concept changes

Even though the RxNorm data provides researchers and informaticists the ability to get structured information such as brand names, ingredients, and so on from the data, the mapping (of more than 200 paths) between RxNorm entities can be challenging. National Library of Medicine provides RxNav, a web interface and a series of APIs to assist researchers. To map entities, you follow these default paths:

Start to End Path Start to End Path
BN => IN BN => IN IN => BN IN => BN
BN => PIN BN => PIN IN => PIN IN => PIN
BN => SBD BN => SBD IN => SCDC IN => SCDC
BN => SBDC BN => SBDC IN => SCDF IN => SCDF
BN => SBDF BN => SBDF IN => SCDG IN => SCDG
... ... ... ...

The healthcare team at Google has replicated the mapping between RxNorm entities to assist researchers interested in doing full table joins using RxNorm. This dataset includes a single, customized table, rxn_all_pathways, that has undergone significant pre-processing with the goal of replicating all of the RxNav pathways. For examples on using the rxn_all_pathways table, see Sample queries.

Dataset sources

RxNorm uses many commercial and non-commercial sources to build the vocabularies. Due to licensing restrictions, the RxNorm public dataset provides only the following publicly available data sources:

  • SAB = RxNorm
  • Vaccines Administered (CVX)
  • Medical Subject Headings (MeSH)
  • CMS Formulary Reference (MTHCMSFRF)
  • FDA Structured Product Labels (MTHSPL)
  • Veteran’s Health Administration National Drug File- Reference Terminology (NDFRT) and related files (NDFRT_FDASPL, NDFRT_FMTSME)
  • Veterans Health Administration National Drug File (VANDF)

For more information on RxNorm table definitions, see Data Files - RxNorm Rich Release Format (RXNRRF).

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What are the RXCUI codes for the ingredients of a list of drugs?

This query maps the drugs listed in the query to their ingredients and returns the RXCUI codes for the ingredients.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  *
FROM
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current`
WHERE
  SOURCE_NAME IN ('Vancomycin 100 MG/ML',
    'Ceftriaxone 250 MG Injection',
    'Metronidazole Pill',
    '1 ML Morphine Sulfate 15 MG/ML Cartridge',
    '{14 (Estrogens, Conjugated (USP) 0.625 MG / medroxyprogesterone acetate 5 MG Oral Tablet) / 14 (Estrogens, Conjugated (USP) 0.625 MG Oral Tablet) } Pack [Premphase 28 Day]',
    'Warfarin Sodium 4 MG Oral Tablet',
    'Diphenhydramine',
    'Metformin / pioglitazone',
    'Ibuprofen 800 MG Extended Release Oral Tablet',
    'Valium' )
  AND TARGET_TTY = 'IN'
ORDER BY
  SOURCE_RXCUI

Command-line

bq query --use_legacy_sql=false '
SELECT
  *
FROM
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current`
WHERE
  SOURCE_NAME IN ('Vancomycin 100 MG/ML',
    'Ceftriaxone 250 MG Injection',
    'Metronidazole Pill',
    '1 ML Morphine Sulfate 15 MG/ML Cartridge',
    '{14 (Estrogens, Conjugated (USP) 0.625 MG / medroxyprogesterone acetate 5 MG Oral Tablet) / 14 (Estrogens, Conjugated (USP) 0.625 MG Oral Tablet) } Pack [Premphase 28 Day]',
    'Warfarin Sodium 4 MG Oral Tablet',
    'Diphenhydramine',
    'Metformin / pioglitazone',
    'Ibuprofen 800 MG Extended Release Oral Tablet',
    'Valium' )
  AND TARGET_TTY = 'IN'
ORDER BY
  SOURCE_RXCUI'

The results are shown here:

+---------------+-------------+--------------------------------------+---------------+------------+------------------------+
| SOURCE_RXCUI  | SOURCE_TTY  | SOURCE_NAME                          | TARGET_RXCUI  | TARGET_TTY | TARGET_NAME            |
+---------------+-------------+--------------------------------------+---------------+------------+------------------------+
| 1000487       | BPCK        | {14 (Estrogens, Conjugated           | 4099          | IN         | Estrogens, Conjugated  |
|                               (USP) 0.625 MG / medroxyprogesterone                                (USP)                  |
|                               acetate 5 MG Oral Tablet) / 14                                                             |
|                               (Estrogens, Conjugated (USP)                                                               |
|                               0.625 MG Oral Tablet) }                                                                    |
|                               Pack [Premphase 28 Day]                                                                    |
| 1000487       | BPCK        | {14 (Estrogens, Conjugated (USP)     | 6691          | IN         | Medroxyprogesterone    |
|                               0.625 MG / medroxyprogesterone                                                             |
|                               acetate 5 MG Oral Tablet) / 14                                                             |
|                               (Estrogens, Conjugated (USP) 0.625                                                         |
|                               MG Oral Tablet) } Pack [Premphase 28                                                       |
|                               Day]                                                                                       |
| 1164084       | SCDG        | Metronidazole Pill                   | 6922          | IN         | Metronidazole          |
| 1733080       | SCD         | 1 ML Morphine Sulfate 15 MG/ML       | 7052          | IN         | Morphine               |
|                               Cartridge                                                                                  |
| 202472        | BN          | Valium                               | 3322          | IN         | Diazepam               |
| 250418        | SCD         | Ibuprofen 800 MG Extended Release    | 5640          | IN         | Ibuprofen              |
|                               Oral Tablet                          |               |            |                        |
| 309092        | SCD         | Ceftriaxone 250 MG Injection         | 2193          | IN         | Ceftriaxone            |
| 342893        | SCDC        | Vancomycin 100 MG/ML                 | 11124         | IN         | Vancomycin             |
| 3498          | IN          | Diphenhydramine                      | 3498          | IN         | Diphenhydramine        |
| 607999        | MIN         | Metformin / pioglitazone             | 33738         | IN         | pioglitazone           |
| 607999        | MIN         | Metformin / pioglitazone             | 6809          | IN         | Metformin              |
| 855324        | SCD         | Warfarin Sodium 4 MG Oral Tablet     | 11289         | IN         | Warfarin               |
+---------------+-------------+--------------------------------------+---------------+------------+------------------------+

Which ingredients have the most variety of dose forms?

This query returns the largest number of dose forms (tablets, oral powders, spray, and so on) for the listed ingredients in RxNorm.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  SOURCE_NAME AS IN_NAME,
  COUNT (*) AS DF_COUNT
FROM
  `bigquery-public-data-staging.rxnorm.rxn_all_pathways`
WHERE
  SOURCE_TTY='IN'
  AND TARGET_TTY='DF'
GROUP BY
  IN_NAME
ORDER BY
  DF_COUNT DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  SOURCE_NAME AS IN_NAME,
  COUNT (*) AS DF_COUNT
FROM
  `bigquery-public-data-staging.rxnorm.rxn_all_pathways`
WHERE
  SOURCE_TTY='IN'
  AND TARGET_TTY='DF'
GROUP BY
  IN_NAME
ORDER BY
  DF_COUNT DESC
LIMIT
  10'

The results are shown here:

+---------------------+-------------+
| IN_NAME             | DF_COUNT    |
+---------------------+-------------+
| Phenylephrine       | 31          |
| Hydrocortisone      | 30          |
| Benzocaine          | 29          |
| Lidocaine           | 28          |
| Menthol             | 27          |
| Diphenhydramine     | 25          |
| Sodium Chloride     | 23          |
| Glycerin            | 23          |
| Sodium Bicarbonate  | 20          |
| Salicylic Acid      | 19          |
+---------------------+-------------+

In what dose forms is the drug phenylephrine found?

The results of the prior query indicate Phenylephrine has the largest number of dose forms. This query returns all the dose forms for phenylephrine.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  *
FROM
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current`
WHERE
  SOURCE_NAME='Phenylephrine'
  AND target_TTY = 'DF'

Command-line

bq query --use_legacy_sql=false '
SELECT
  *
FROM
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current`
WHERE
  SOURCE_NAME='Phenylephrine'
  AND target_TTY = 'DF''

The results are shown here:

+---------------+-------------+---------------+-------------+-------------+--------------------------------+
| SOURCE_RXCUI  | SOURCE_TTY  | SOURCE_NAME   |TARGET_RXCUI | TARGET_TTY  | TARGET_NAME                    |
+---------------+-------------+---------------+-------------+-------------+--------------------------------+
| 8163          | IN          | Phenylephrine | 244286      | DF          | Ophthalmic Irrigation Solution |
| 8163          | IN          | Phenylephrine | 316946      | DF          | Extended Release Suspension    |
| 8163          | IN          | Phenylephrine | 346165      | DF          | Topical Spray                  |
| 8163          | IN          | Phenylephrine | 316975      | DF          | Rectal Cream                   |
| 8163          | IN          | Phenylephrine | 346170      | DF          | Rectal Gel                     |
| 8163          | IN          | Phenylephrine | 316949      | DF          | Injectable Solution            |
| 8163          | IN          | Phenylephrine | 346289      | DF          | Oral Powder                    |
| 8163          | IN          | Phenylephrine | 316965      | DF          | Oral Capsule                   |
| 8163          | IN          | Phenylephrine | 1535727     | DF          | Effervescent Oral Tablet       |
| 8163          | IN          | Phenylephrine | 126542      | DF          | Nasal Spray                    |
| ...           |             |               |             |             |                                |
+---------------+-------------+---------------+-------------+-------------+--------------------------------+

What are the ingredients of the drug labeled with the generic code number 072718?

If you have a UMLS license, you can:

  • Create a crosswalk between a proprietary, licensed data source such as FDB (SAB = NDDF)
  • Combine your table with the RxNorm public dataset to retrieve metadata about a drug!

First, load the rxnconso table into one of your own BigQuery datasets. You can then join the rxnconso table with rxn_all_pathways to analyze the data.

For example, to find the ingredients of the drug labeled "Generic Code Number 072718," enter the following query. In the query, replace [Project_ID] with your project ID, replace [DATASET] with the name of the dataset containing your rxnconso table, and replace [RXCONSO_TABLE] with the name of your rxnconso table.

Web UI

#standardSQL
SELECT
  r2.source_rxcui,
  r2.source_name,
  r2.source_tty,
  r2.target_rxcui,
  r2.target_name,
  r2.target_tty
FROM (
  SELECT
    rxcui
  FROM
    `[PROJECT_ID].[DATASET].[RXCONSO_TABLE]` as r1
  WHERE
    CODE='072718'
  AND
    SAB='NDDF'
  GROUP BY rxcui) as r1
INNER JOIN
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current` as r2
ON
  r1.rxcui = r2.source_rxcui
WHERE
  r2.target_tty='IN'

Command-line

bq query --use_legacy_sql=false '
SELECT
  r2.source_rxcui,
  r2.source_name,
  r2.source_tty,
  r2.target_rxcui,
  r2.target_name,
  r2.target_tty
FROM (
  SELECT
    rxcui
  FROM
    `[PROJECT_ID].[DATASET].[RXCONSO_TABLE]` as r1
  WHERE
    CODE='072718'
  AND
    SAB='NDDF'
  GROUP BY rxcui) as r1
INNER JOIN
  `bigquery-public-data.nlm_rxnorm.rxn_all_pathways_current` as r2
ON
  r1.rxcui = r2.source_rxcui
WHERE
  r2.target_tty='IN''

About the data

Dataset Source: Unified Medical Language System RxNorm

Category: Healthcare

Use: The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

Update Frequency: Monthly

View in BigQuery: Go to RxNorm

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...