United States Census Data

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

You can start exploring this data in the BigQuery console:

Go to United States census data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

Census data, by itself, provides useful information about the population, but it can also be joined with other datasets to provide valuable insight about the population. One area where this can be beneficial is in healthcare and population health. Census data can be combined with other publicly available datasets such as Medicare data, Center for Disease Control (CDC) data, and many others to provide valuable insights such as hospitalization trends, healthcare utilization, and so on. In the following sample queries, census data is examined on its own and by joining it with the Centers for Medicare and Medicaid Services BigQuery public dataset.

What are the ten most populous zip codes in the US in the 2010 census?

The census data uses zip code tabular areas (ZCTAs). In this query, ZCTAs are used to approximate the ten most populated zip codes.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  zipcode,
  population
FROM
  `bigquery-public-data.census_bureau_usa.population_by_zip_2010`
WHERE
  gender = ''
ORDER BY
  population DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  zipcode,
  population
FROM
  `bigquery-public-data.census_bureau_usa.population_by_zip_2010`
WHERE
  gender = ''
ORDER BY
  population DESC
LIMIT
  10'

The results are shown here:

+---------+-------------+
| zipcode | population  |
+---------+-------------+
| 60629   | 113916      |
| 79936   | 111086      |
| 11368   | 109931      |
| 00926   | 108862      |
| 90650   | 105549      |
| 90011   | 103892      |
| 91331   | 103689      |
| 11226   | 101572      |
| 90201   | 101279      |
| 11373   | 100820      |
+---------+-------------+

If you remove LIMIT 10 from the query, you can use the query results to visualize the population of each zip code. For example, you can create a visualization like the following in Google Data Studio.

Population by zip code

For a tutorial on using Google Data Studio with BigQuery, see Visualizing BigQuery Data Using Google Data Studio.

What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

In this query, zip code tabular areas (ZCTAs) are used to approximate zip codes. Using ZCTAs, this query analyzes population changes. This analysis does not account for any changes in census methodology that affect population data. Interested users are encouraged to explore these population changes further.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  zipcode,
  pop_2000,
  pop_2010,
  pop_chg,
  pop_pct_chg
FROM (
  SELECT
    r1.zipcode AS zipcode,
    r2.population AS pop_2000,
    r1.population AS pop_2010,
    r1.population - r2.population AS pop_chg,
    ROUND((r1.population - r2.population)/NULLIF(r2.population,0) * 100, 2) AS pop_pct_chg,
    ABS((r1.population - r2.population)/NULLIF(r2.population,0)) AS abs_pct_chg
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS r1
  INNER JOIN
    `bigquery-public-data.census_bureau_usa.population_by_zip_2000` AS r2
  ON
    r1.zipcode = r2.zipcode WHERE --following criteria selects total population without breaking down by age/gender
    r1.minimum_age IS NULL
    AND r2.minimum_age IS NULL
    AND r1.maximum_age IS NULL
    AND r2.maximum_age IS NULL
    AND r1.gender = ''
    AND r2.gender = '' )
ORDER BY
  abs_pct_chg DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  zipcode,
  pop_2000,
  pop_2010,
  pop_chg,
  pop_pct_chg
FROM (
  SELECT
    r1.zipcode AS zipcode,
    r2.population AS pop_2000,
    r1.population AS pop_2010,
    r1.population - r2.population AS pop_chg,
    ROUND((r1.population - r2.population)/NULLIF(r2.population,0) * 100, 2) AS pop_pct_chg,
    ABS((r1.population - r2.population)/NULLIF(r2.population,0)) AS abs_pct_chg
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS r1
  INNER JOIN
    `bigquery-public-data.census_bureau_usa.population_by_zip_2000` AS r2
  ON
    r1.zipcode = r2.zipcode WHERE --following criteria selects total population without breaking down by age/gender
    r1.minimum_age IS NULL
    AND r2.minimum_age IS NULL
    AND r1.maximum_age IS NULL
    AND r2.maximum_age IS NULL
    AND r1.gender = ''
    AND r2.gender = '' )
ORDER BY
  abs_pct_chg DESC
LIMIT
  10'

The results are shown here:

+-----------+-----------+-----------+----------+--------------+
| zipcode   | pop_2000  | pop_2010  | pop_chg  | pop_pct_chg  |
+-----------+-----------+-----------+----------+--------------+
| 60654     | 7         | 14875     | 14868    | 212400.0     |
| 90263     | 2         | 1612      | 1610     | 80500.0      |
| 70373     | 19        | 7141      | 7122     | 37484.21     |
| 95937     | 4         | 1491      | 1487     | 37175.0      |
| 98164     | 1         | 141       | 140      | 14000.0      |
| 25644     | 4         | 536       | 532      | 13300.0      |
| 89011     | 175       | 19550     | 19375    | 11071.43     |
| 76177     | 45        | 4891      | 4846     | 10768.89     |
| 89141     | 262       | 25150     | 24888    | 9499.24      |
| 52235     | 8         | 698       | 690      | 8625.0       |
+-----------+-----------+-----------+----------+--------------+

What are the ten zip codes with highest number of physicians per population (based on Medicare reimbursement data)?

The BigQuery Centers for Medicare and Medicaid Services public dataset provides information on services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals in 2012. This query analyzes the number of physicians in each zip code (as listed in the Medicare data) compared to underlying population in each zip code according to the 2010 census data. ZCTAs are used in the census data to approximate zip codes.

There are multiple caveats to these results that deserve attention. These caveats include (among others):

  • The years of the datasets used in the comparison differ (2010 versus 2012)
  • The Medicare data doesn’t include all physicians
  • The census zip codes are approximate (based on ZCTAs)
  • Aggregation of zip codes may provide a more accurate view of the true underlying population

Caveats aside, this query provides an interesting illustration of how to combine census data with other datasets.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  t1.zip5 AS zip5,
  t1.provider_cnt AS provider_count,
  t2.population AS population,
  ROUND(t1.provider_cnt / NULLIF(t2.population,0),2) AS ratio
FROM (
  SELECT
    CASE
      WHEN LENGTH(nppes_provider_zip)=5 THEN nppes_provider_zip
      WHEN LENGTH(nppes_provider_zip)=9 THEN SUBSTR(nppes_provider_zip,0,5)
      ELSE '0'
    END AS zip5,
    COUNT(*) AS provider_cnt
  FROM
    `bigquery-public-data.cms_medicare.physicians_and_other_supplier_2012`
  WHERE
    REGEXP_CONTAINS(nppes_credentials, r'(\W|^)[mM]\.*[Dd]')
  GROUP BY
    zip5 ) AS t1
INNER JOIN (
  SELECT
    population,
    zipcode
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010`
  WHERE
    gender =''
    AND population > 1000) AS t2 --limit to populations > 1000 to avoid any oddities that occur with very small zipcodes
ON
  t2.zipcode=t1.zip5
ORDER BY
  ratio DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  t1.zip5 AS zip5,
  t1.provider_cnt AS provider_count,
  t2.population AS population,
  ROUND(t1.provider_cnt / NULLIF(t2.population,0),2) AS ratio
FROM (
  SELECT
    CASE
      WHEN LENGTH(nppes_provider_zip)=5 THEN nppes_provider_zip
      WHEN LENGTH(nppes_provider_zip)=9 THEN SUBSTR(nppes_provider_zip,0,5)
      ELSE '0'
    END AS zip5,
    COUNT(*) AS provider_cnt
  FROM
    `bigquery-public-data.cms_medicare.physicians_and_other_supplier_2012`
  WHERE
    REGEXP_CONTAINS(nppes_credentials, r'(\W|^)[mM]\.*[Dd]')
  GROUP BY
    zip5 ) AS t1
INNER JOIN (
  SELECT
    population,
    zipcode
  FROM
    `bigquery-public-data.census_bureau_usa.population_by_zip_2010`
  WHERE
    gender =''
    AND population > 1000) AS t2 --limit to populations > 1000 to avoid any oddities that occur with very small zipcodes
ON
  t2.zipcode=t1.zip5
ORDER BY
  ratio DESC
LIMIT
  10'

The results are shown here:

+---------+-----------------+-------------+------------+
| zip5    | provider_count  | population  | ratio      |
+---------+-----------------+-------------+------------+
| 35233   | 5494            | 1308        | 4.2        |
| 52242   | 5679            | 1592        | 3.57       |
| 30322   | 5711            | 2023        | 2.82       |
| 75246   | 7439            | 2770        | 2.69       |
| 55455   | 2798            | 1050        | 2.66       |
| 78215   | 2774            | 1150        | 2.41       |
| 73104   | 3942            | 1664        | 2.37       |
| 39216   | 8058            | 3454        | 2.33       |
| 77030   | 23282           | 10258       | 2.27       |
| 02462   | 2951            | 1554        | 1.9        |
+---------+-----------------+-------------+------------+

About the data

Dataset Source: United States Census Bureau

Category: Encyclopedic

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Historical

View in BigQuery: Go to United States census data

Send feedback about...