医疗保险数据

如何使用 BigQuery 查询公开数据集

BigQuery 是一种完全托管的数据仓库和分析平台。它可以提供公开数据集,供您使用 SQL 查询进行分析。要访问 BigQuery 公开数据集,您可以使用网页界面命令行工具,或者通过各种客户端库(如 Java.NETPython)来调用 BigQuery REST API

目前,BigQuery 公开数据集存储在 US 多区域位置。查询公开数据集时,可在命令行中提供 --location=US 标记,在 BigQuery 网页界面上选择 US 作为处理位置,或者在使用 API 时,在作业资源jobReference 部分中指定 location 属性。公开数据集存储在 US,因此您无法将公开数据查询结果写入另一个区域的表中,并且无法将公开数据集内的表与另一个区域中的表连接在一起。

要开始使用 BigQuery 公开数据集,请创建或选择一个项目。我们每月免费处理 1 TB 数据,因此您无需启用结算功能即可开始查询公开数据集。如果您打算处理的数据量超出免费层级范围,则还应启用结算功能。

  1. 登录您的 Google 帐号。

    如果您还没有 Google 帐号,请注册新帐号

  2. Select or create a Google Cloud Platform project.

    Go to the Manage resources page

  3. 确保您的项目已启用结算功能。

    了解如何启用结算功能

  4. 系统会自动为新项目启用 BigQuery。要在现有的项目中启用 BigQuery,请 启用BigQuery API。

    启用 API

数据集概览

此公开数据集由美国医疗保险和医疗补助服务中心 (Centers for Medicare & Medicaid Services) 创建。这些数据总结了向医疗保险受益人提供的医疗程序、服务及处方药的使用和支付情况(按具体住院和门诊患者、医生和其他供应商划分)。该数据集包括以下数据:

  • 常见的住院和门诊服务
  • 所有医生和其他供应商提供的医疗程序和服务
  • 所有纳入《联邦医疗保险制度 D 部分》范围内的处方药。

提供商确定他们提供给患者的物品、服务和医疗程序需要收取的费用,提供商会针对相应的商品、服务或医疗程序收取相应的费用。

您可以开始在 BigQuery 控制台中浏览这些数据:

转到“医疗保险”数据集

查询示例

以下是一些示例,展示了您可以在 BigQuery 中针对这些数据运行的 SQL 查询。

这些示例使用了 BigQuery 对标准 SQL 的支持。请使用 #standardSQL 标记告知 BigQuery 您要使用标准 SQL。要详细了解 #standardSQL 前缀,请参阅设置查询前缀

各州开具的处方药总数有多少?

以下查询会统计各州的医疗索赔总数。输出表中显示了排名前五位的州的医疗索赔总数。

网页界面

在网页界面中打开以下查询

#standardSQL
SELECT
  nppes_provider_state AS state,
  ROUND(SUM(total_claim_count) / 1e6) AS total_claim_count_millions
FROM
  `bigquery-public-data.medicare.part_d_prescriber_2014`
GROUP BY
  state
ORDER BY
  total_claim_count_millions DESC
LIMIT
  5;

命令行

bq query --use_legacy_sql=false '
SELECT
  nppes_provider_state AS state,
  ROUND(SUM(total_claim_count) / 1e6) AS total_claim_count_millions
FROM
  `bigquery-public-data.medicare.part_d_prescriber_2014`
GROUP BY
  state
ORDER BY
  total_claim_count_millions DESC
LIMIT
  5;'

结果如下所示:

+-------+----------------------------+
| state | total_claim_count_millions |
+-------+----------------------------+
|  CA   | 116.0                      |
|  FL   | 91.0                       |
|  NY   | 80.0                       |
|  TX   | 76.0                       |
|  PA   | 63.0                       |
+-------+----------------------------+

各州中最常用的处方药是什么?

此查询基于前一查询而构建,用于查找各州医疗索赔数量最高的药物。

网页界面

在网页界面中打开以下查询

#standardSQL
SELECT
  A.state,
  drug_name,
  total_claim_count,
  day_supply,
  ROUND(total_cost_millions) AS total_cost_millions
FROM (
  SELECT
    generic_name AS drug_name,
    nppes_provider_state AS state,
    ROUND(SUM(total_claim_count)) AS total_claim_count,
    ROUND(SUM(total_day_supply)) AS day_supply,
    ROUND(SUM(total_drug_cost)) / 1e6 AS total_cost_millions
  FROM
    `bigquery-public-data.medicare.part_d_prescriber_2014`
  GROUP BY
    state,
    drug_name) A
INNER JOIN (
  SELECT
    state,
    MAX(total_claim_count) AS max_total_claim_count
  FROM (
    SELECT
      nppes_provider_state AS state,
      ROUND(SUM(total_claim_count)) AS total_claim_count
    FROM
      `bigquery-public-data.medicare.part_d_prescriber_2014`
    GROUP BY
      state,
      generic_name)
  GROUP BY
    state) B
ON
  A.state = B.state
  AND A.total_claim_count = B.max_total_claim_count
ORDER BY
  A.total_claim_count DESC
LIMIT
  5;

命令行

bq query --use_legacy_sql=false '
SELECT
  A.state,
  drug_name,
  total_claim_count,
  day_supply,
  ROUND(total_cost_millions) AS total_cost_millions
FROM (
  SELECT
    generic_name AS drug_name,
    nppes_provider_state AS state,
    ROUND(SUM(total_claim_count)) AS total_claim_count,
    ROUND(SUM(total_day_supply)) AS day_supply,
    ROUND(SUM(total_drug_cost)) / 1e6 AS total_cost_millions
  FROM
    `bigquery-public-data.medicare.part_d_prescriber_2014`
  GROUP BY
    state,
    drug_name) A
INNER JOIN (
  SELECT
    state,
    MAX(total_claim_count) AS max_total_claim_count
  FROM (
    SELECT
      nppes_provider_state AS state,
      ROUND(SUM(total_claim_count)) AS total_claim_count
    FROM
      `bigquery-public-data.medicare.part_d_prescriber_2014`
    GROUP BY
      state,
      generic_name)
  GROUP BY
    state) B
ON
  A.state = B.state
  AND A.total_claim_count = B.max_total_claim_count
ORDER BY
  A.total_claim_count DESC
LIMIT
  5;'

输出中显示了排名前五位的州,以及相应的药物名称、总费用和医疗索赔总数。

+-------+---------------------------+-------------------+--------------+---------------------+
| state |         drug_name         | total_claim_count |  day_supply  | total_cost_millions |
+-------+---------------------------+-------------------+--------------+---------------------+
| CA    | LEVOTHYROXINE SODIUM      |         3845087.0 |  2.1174738E8 |                78.0 |
| FL    | LEVOTHYROXINE SODIUM      |         2982612.0 |  1.6338837E8 |                64.0 |
| TX    | HYDROCODONE/ACETAMINOPHEN |         2834059.0 |  6.0410516E7 |                63.0 |
| NY    | AMLODIPINE BESYLATE       |         2609896.0 | 1.23226481E8 |                21.0 |
| PA    | LEVOTHYROXINE SODIUM      |         2353845.0 | 1.09168537E8 |                44.0 |
+-------+---------------------------+-------------------+--------------+---------------------+

数据集的分层随机抽样

以下查询演示了如何对数据集进行分层,并从每个分层中抽取随机样本。在本例中,provider_stateprovider_city 的叉积定义了分层列表。在按随机数排序后,我们为每个分层分配一个 RANK() 序号。您可以通过修改过滤条件来指定每个分层中的样本数量:WHERE samples_per_strata <= 10

网页界面

在网页界面中打开以下查询

#standardSQL
SELECT
  *
FROM (
  SELECT
    provider_state,
    provider_city,
    ROUND(average_covered_charges,2),
    ROUND(average_total_payments,2),
    ROUND(average_medicare_payments,2),
    RANK() OVER(PARTITION BY provider_state, provider_city ORDER BY random_number) AS samples_per_strata
  FROM (
    SELECT
      provider_state,
      provider_city,
      average_covered_charges,
      average_total_payments,
      average_medicare_payments,
      RAND() AS random_number
    FROM
     `bigquery-public-data.medicare.inpatient_charges_2014`) )
WHERE
  samples_per_strata <= 10
LIMIT
  5;

命令行

bq query --use_legacy_sql=false '
SELECT
  *
FROM (
  SELECT
    provider_state,
    provider_city,
    ROUND(average_covered_charges,2),
    ROUND(average_total_payments,2),
    ROUND(average_medicare_payments,2),
    RANK() OVER(PARTITION BY provider_state, provider_city ORDER BY random_number) AS samples_per_strata
  FROM (
    SELECT
      provider_state,
      provider_city,
      average_covered_charges,
      average_total_payments,
      average_medicare_payments,
      RAND() AS random_number
    FROM
     `bigquery-public-data.medicare.inpatient_charges_2014`) )
WHERE
  samples_per_strata <= 10
LIMIT
  5;'

结果如下所示:

+----------------+---------------+-----------+----------+----------+--------------------+
| provider_state | provider_city |    f0_    |   f1_    |   f2_    | samples_per_strata |
+----------------+---------------+-----------+----------+----------+--------------------+
| AL             | GADSDEN       | 100478.64 |  7359.86 |  6324.43 |                  3 |
| AL             | GADSDEN       |  57587.79 |  6003.93 |  5309.07 |                  1 |
| AL             | GADSDEN       |  47132.23 |   4039.9 |  2634.94 |                  5 |
| AL             | GADSDEN       |  55738.92 |  6098.92 |  5264.46 |                  2 |
| AL             | GADSDEN       |  582430.0 | 59699.62 | 58705.15 |                  4 |
+----------------+---------------+-----------+----------+----------+--------------------+

每个城市和州的住院治疗和门诊治疗的平均费用是多少?

为了查明 2014 年每个城市和州中各提供商的住院和门诊平均费用,我们需要两个表:2014 年住院表 (inpatient_charges_2014) 和 2014 年门诊表 (outpatient_charges_2014)。

该查询包括以下步骤:

  1. 通过 provider_id 聚合各表中的费用。为此,您需要获取提供商完成的各医疗程序的费用的加权平均值。

  2. provider_idprovider_cityprovider_state 上建立连接。

  3. 按住院和门诊医疗程序的平均费用之和排序。

网页界面

在网页界面中打开以下查询

#standardSQL
SELECT
  OP.provider_state AS State,
  OP.provider_city AS City,
  OP.provider_id AS Provider_ID,
  ROUND(OP.average_OP_cost) AS Average_OP_Cost,
  ROUND(IP.average_IP_cost) AS Average_IP_Cost,
  ROUND(OP.average_OP_cost + IP.average_IP_cost) AS Combined_Average_Cost
FROM (
  SELECT
    provider_state,
    provider_city,
    provider_id,
    SUM(average_total_payments*outpatient_services)/SUM(outpatient_services) AS average_OP_cost
  FROM
    `bigquery-public-data.medicare.outpatient_charges_2014`
  GROUP BY
    provider_state,
    provider_city,
    provider_id ) AS OP
INNER JOIN (
  SELECT
    provider_state,
    provider_city,
    provider_id,
    SUM(average_medicare_payments*total_discharges)/SUM(total_discharges) AS average_IP_cost
  FROM
    `bigquery-public-data.medicare.inpatient_charges_2014`
  GROUP BY
    provider_state,
    provider_city,
    provider_id ) AS IP
ON
  OP.provider_id = IP.provider_id
  AND OP.provider_state = IP.provider_state
  AND OP.provider_city = IP.provider_city
ORDER BY
  combined_average_cost DESC
LIMIT
  10;

命令行

bq query --use_legacy_sql=false '
SELECT
  OP.provider_state AS State,
  OP.provider_city AS City,
  OP.provider_id AS Provider_ID,
  ROUND(OP.average_OP_cost) AS Average_OP_Cost,
  ROUND(IP.average_IP_cost) AS Average_IP_Cost,
  ROUND(OP.average_OP_cost + IP.average_IP_cost) AS Combined_Average_Cost
FROM (
  SELECT
    provider_state,
    provider_city,
    provider_id,
    SUM(average_total_payments*outpatient_services)/SUM(outpatient_services) AS average_OP_cost
  FROM
    `bigquery-public-data.medicare.outpatient_charges_2014`
  GROUP BY
    provider_state,
    provider_city,
    provider_id ) AS OP
INNER JOIN (
  SELECT
    provider_state,
    provider_city,
    provider_id,
    SUM(average_medicare_payments*total_discharges)/SUM(total_discharges) AS average_IP_cost
  FROM
    `bigquery-public-data.medicare.inpatient_charges_2014`
  GROUP BY
    provider_state,
    provider_city,
    provider_id ) AS IP
ON
  OP.provider_id = IP.provider_id
  AND OP.provider_state = IP.provider_state
  AND OP.provider_city = IP.provider_city
ORDER BY
  combined_average_cost DESC
LIMIT
  10;'

结果如下所示:

+-------+---------------+-------------+-----------------+-----------------+-----------------------+
| State |     City      | Provider_ID | Average_OP_Cost | Average_IP_Cost | Combined_Average_Cost |
+-------+---------------+-------------+-----------------+-----------------+-----------------------+
| IN    | MISHAWAKA     |      150177 |           399.0 |        102521.0 |              102920.0 |
| MI    | WARREN        |      230264 |           104.0 |         88620.0 |               88724.0 |
| TX    | HOUSTON       |      450674 |            88.0 |         67571.0 |               67659.0 |
| TX    | EL PASO       |      450877 |           230.0 |         45179.0 |               45409.0 |
| CO    | VAIL          |       60096 |           261.0 |         38651.0 |               38912.0 |
| TN    | MEMPHIS       |      440152 |            90.0 |         35698.0 |               35788.0 |
| LA    | BATON ROUGE   |      190128 |            87.0 |         34369.0 |               34456.0 |
| LA    | LEESVILLE     |      190297 |           560.0 |         32611.0 |               33172.0 |
| CA    | SAN FRANCISCO |       50454 |           157.0 |         32803.0 |               32959.0 |
| CA    | LOS ANGELES   |       50262 |           120.0 |         31795.0 |               31916.0 |
+-------+---------------+-------------+-----------------+-----------------+-----------------------+

各州诊断条件排名和平均付款金额比较

在此查询中,我们尝试回答以下问题:

  • 美国最常见的住院诊断条件是什么?

  • 针对每种诊断条件,哪个城市的病例数量最多?

  • 在这些城市中,针对这些条件的平均付款金额是多少?与全国平均水平相比如何?

网页界面

在网页界面中打开以下查询

#standardSQL
SELECT
  drg_definition AS Diagnosis,
  provider_city AS City,
  provider_state AS State,
  cityrank AS City_Rank,
  CAST(ROUND(citywise_avg_total_payments) AS INT64) AS Citywise_Avg_Payments,
  CONCAT(CAST(ROUND(citywise_avg_total_payments /national_avg_total_payments * 100, 0) AS STRING), " %") AS Avg_Payments_City_vs_National
FROM (
  SELECT
    drg_definition,
    provider_city,
    provider_state,
    cityrank,
    national_num_cases,
    citywise_avg_total_payments,
    national_sum_total_payments,
    (national_sum_total_payments /national_num_cases) AS national_avg_total_payments
  FROM (
    SELECT
      drg_definition,
      provider_city,
      provider_state,
      citywise_avg_total_payments,
      RANK() OVER (PARTITION BY drg_definition ORDER BY citywise_num_cases DESC ) AS cityrank,
      SUM(citywise_num_cases) OVER (PARTITION BY drg_definition ) AS national_num_cases,
      SUM(citywise_sum_total_payments) OVER (PARTITION BY drg_definition ) AS national_sum_total_payments
    FROM (
      SELECT
        drg_definition,
        provider_city,
        provider_state,
        SUM(total_discharges) AS citywise_num_cases,
        SUM(average_total_payments * total_discharges)/ SUM(total_discharges) AS citywise_avg_total_payments,
        SUM(average_total_payments * total_discharges) AS citywise_sum_total_payments
      FROM
        `bigquery-public-data.medicare.inpatient_charges_2014`
      GROUP BY
        drg_definition,
        provider_city,
        provider_state))
  WHERE
    cityrank <=3)  # Limit to top 3 cities for each Diagnosis
ORDER BY
  national_num_cases DESC,
  cityrank
LIMIT
  9;  # Limit Results to the top 3 cities for the top 3 diagnosis

命令行

bq query --use_legacy_sql=false '
SELECT
  drg_definition AS Diagnosis,
  provider_city AS City,
  provider_state AS State,
  cityrank AS City_Rank,
  CAST(ROUND(citywise_avg_total_payments) AS INT64) AS Citywise_Avg_Payments,
  CONCAT(CAST(ROUND(citywise_avg_total_payments /national_avg_total_payments * 100, 0) AS STRING), " %") AS Avg_Payments_City_vs_National
FROM (
  SELECT
    drg_definition,
    provider_city,
    provider_state,
    cityrank,
    national_num_cases,
    citywise_avg_total_payments,
    national_sum_total_payments,
    (national_sum_total_payments /national_num_cases) AS national_avg_total_payments
  FROM (
    SELECT
      drg_definition,
      provider_city,
      provider_state,
      citywise_avg_total_payments,
      RANK() OVER (PARTITION BY drg_definition ORDER BY citywise_num_cases DESC ) AS cityrank,
      SUM(citywise_num_cases) OVER (PARTITION BY drg_definition ) AS national_num_cases,
      SUM(citywise_sum_total_payments) OVER (PARTITION BY drg_definition ) AS national_sum_total_payments
    FROM (
      SELECT
        drg_definition,
        provider_city,
        provider_state,
        SUM(total_discharges) AS citywise_num_cases,
        SUM(average_total_payments * total_discharges)/ SUM(total_discharges) AS citywise_avg_total_payments,
        SUM(average_total_payments * total_discharges) AS citywise_sum_total_payments
      FROM
        `bigquery-public-data.medicare.inpatient_charges_2014`
      GROUP BY
        drg_definition,
        provider_city,
        provider_state))
  WHERE
    cityrank <=3)  # Limit to top 3 cities for each Diagnosis
ORDER BY
  national_num_cases DESC,
  cityrank
LIMIT # Limit Results to the top 3 cities for the top 3 diagnosis
  9;'

结果如下所示:

+--------------------------------------------------------------------------+-----------+-------+-----------+-----------------------+-------------------------------+
|                                Diagnosis                                 |   City    | State | City_Rank | Citywise_Avg_Payments | Avg_Payments_City_vs_National |
+--------------------------------------------------------------------------+-----------+-------+-----------+-----------------------+-------------------------------+
| 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC | NEW YORK  | NY    |         1 |                 21656 | 144 %                         |
| 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC | BOSTON    | MA    |         2 |                 18324 | 122 %                         |
| 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTREMITY W/O MCC | BALTIMORE | MD    |         3 |                 23056 | 154 %                         |
| 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC                 | HOUSTON   | TX    |         1 |                 14592 | 106 %                         |
| 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC                 | CHICAGO   | IL    |         2 |                 16071 | 117 %                         |
| 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV 96+ HOURS W MCC                 | BROOKLYN  | NY    |         3 |                 23398 | 170 %                         |
| 291 - HEART FAILURE & SHOCK W MCC                                        | CHICAGO   | IL    |         1 |                 13287 | 122 %                         |
| 291 - HEART FAILURE & SHOCK W MCC                                        | HOUSTON   | TX    |         2 |                 12572 | 116 %                         |
| 291 - HEART FAILURE & SHOCK W MCC                                        | BALTIMORE | MD    |         3 |                 16992 | 157 %                         |
+--------------------------------------------------------------------------+-----------+-------+-----------+-----------------------+-------------------------------+

关于数据

数据集来源美国医疗保险与医疗补助服务中心

类别:医疗保健

使用:此数据集公开供任何人使用,前提是使用者必须遵守数据集来源规定的以下条款:http://www.data.gov/privacy-policy#data_policy;Google“按原样”提供数据集,对此不作任何明示或暗示的保证。对于因使用数据集而导致的任何直接或间接损害,Google 不承担任何法律责任。

更新频率:每年

在 BigQuery 中查看转到“医疗保险”数据集

此页内容是否有用?请给出您的反馈和评价:

发送以下问题的反馈:

此网页
需要帮助?请访问我们的支持页面