USA Name Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI, the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

Currently, BigQuery public datasets are stored in the US multi-region location. When you query a public dataset, supply the --location=US flag on the command line, choose US as the processing location in the BigQuery web UI, or specify the location property in the jobReference section of the job resource when you use the API. Because the public datasets are stored in the US, you cannot write public data query results to a table in another region, and you cannot join tables in public datasets with tables in another region.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Assurez-vous que la facturation est activée pour votre projet.

    En savoir plus sur l'activation de la facturation

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

You can start exploring this data in the BigQuery web UI.

Go to the USA Names dataset

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s legacy SQL by setting the #legacySQL prefix. For more information, see Setting a query prefix.

What are the most common names?

Notice that the first most common female name is 6th on the list.

What are the most common female names?

Are there more female or male names?

Female names by a wide margin.

About the data

Dataset Source:

Category: Social

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

View in BigQuery: Go to the USA Names dataset

Cette page vous a-t-elle été utile ? Évaluez-la :

Envoyer des commentaires concernant…