Explorer les résultats de la requête dans les notebooks
Restez organisé à l'aide des collections
Enregistrez et classez les contenus selon vos préférences.
Vous pouvez explorer les résultats de requêtes BigQuery à l'aide de notebooks Colab Enterprise, au sein de BigQuery.
Dans ce tutoriel, vous allez interroger les données d'un ensemble de données public BigQuery et explorer les résultats de la requête dans un notebook.
Objectifs
Créer et exécuter une requête dans BigQuery
Explorer les résultats de la requête dans un notebook
Coûts
Ce tutoriel utilise un ensemble de données disponible via le programme d'ensembles de données publicsGoogle Cloud .
Google prend en charge le stockage de ces ensembles de données et fournit un accès public aux données. Des frais vous sont facturés pour les requêtes que vous effectuez sur les données. Pour en savoir plus, consultez la page décrivant les tarifs de BigQuery.
Avant de commencer
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
BigQuery est automatiquement activé dans les nouveaux projets.
Définir la région par défaut des composants de code
Si c'est la première fois que vous créez un composant de code, vous devez définir la région par défaut pour les composants de code. Vous ne pouvez pas modifier la région d'un composant de code après sa création.
Tous les composants de code dans BigQuery Studio utilisent la même région par défaut.
Pour définir la région par défaut des composants de code, procédez comme suit :
Ouvrir les résultats de la requête dans un notebook
Vous pouvez exécuter une requête SQL, puis utiliser un notebook pour explorer les données. Cette approche est utile si vous souhaitez modifier les données dans BigQuery avant de les exploiter, ou si vous n'avez besoin que d'un sous-ensemble des champs de la table.
Dans la console Google Cloud , accédez à la page BigQuery.
Dans le champ Saisissez un terme à rechercher, saisissez bigquery-public-data.
Si le projet ne s'affiche pas, saisissez bigquery dans le champ de recherche, puis cliquez sur Rechercher dans tous les projets pour faire correspondre la chaîne de recherche avec les projets existants.
Dans le bloc de code Configuration, cliquez sur play_circleExécuter la cellule.
Explorer les données
Pour charger les données penguins dans un DataFrame BigQuery et afficher les résultats, cliquez sur play_circleExécuter la cellule dans le bloc de code de la section Ensemble de résultats chargé à partir d'un job BigQuery en tant que DataFrame.
Pour obtenir des métriques descriptives des données, cliquez sur
play_circleExécuter la cellule dans le bloc de code de la section Afficher les statistiques descriptives à l'aide de describe().
Facultatif : Utilisez d'autres fonctions ou packages Python pour explorer et analyser les données.
L'exemple de code suivant montre comment utiliser :
bigframes.pandas pour analyser des données et bigframes.ml
pour créer un modèle de régression linéaire à partir de données penguins
dans un DataFrame BigQuery :
importbigframes.pandasasbpd# Load data from BigQueryquery_or_table="bigquery-public-data.ml_datasets.penguins"bq_df=bpd.read_gbq(query_or_table)# Inspect one of the columns (or series) of the DataFrame:bq_df["body_mass_g"]# Compute the mean of this series:average_body_mass=bq_df["body_mass_g"].mean()print(f"average_body_mass: {average_body_mass}")# Find the heaviest species using the groupby operation to calculate the# mean body_mass_g:(bq_df["body_mass_g"].groupby(by=bq_df["species"]).mean().sort_values(ascending=False).head(10))# Create the Linear Regression modelfrombigframes.ml.linear_modelimportLinearRegression# Filter down to the data we want to analyzeadelie_data=bq_df[bq_df.species=="Adelie Penguin (Pygoscelis adeliae)"]# Drop the columns we don't care aboutadelie_data=adelie_data.drop(columns=["species"])# Drop rows with nulls to get our training datatraining_data=adelie_data.dropna()# Pick feature columns and label columnX=training_data[["island","culmen_length_mm","culmen_depth_mm","flipper_length_mm","sex",]]y=training_data[["body_mass_g"]]model=LinearRegression(fit_intercept=False)model.fit(X,y)model.score(X,y)
Effectuer un nettoyage
Pour éviter que les ressources utilisées lors de ce tutoriel soient facturées sur votre compte Google Cloud, supprimez le projet contenant les ressources, ou conservez le projet et supprimez les ressources individuelles.
Le moyen le plus simple d'empêcher la facturation est de supprimer le projet Google Cloud que vous avez créé pour ce tutoriel.
In the Google Cloud console, go to the Manage resources page.
Sauf indication contraire, le contenu de cette page est régi par une licence Creative Commons Attribution 4.0, et les échantillons de code sont régis par une licence Apache 2.0. Pour en savoir plus, consultez les Règles du site Google Developers. Java est une marque déposée d'Oracle et/ou de ses sociétés affiliées.
Dernière mise à jour le 2025/09/04 (UTC).
[[["Facile à comprendre","easyToUnderstand","thumb-up"],["J'ai pu résoudre mon problème","solvedMyProblem","thumb-up"],["Autre","otherUp","thumb-up"]],[["Difficile à comprendre","hardToUnderstand","thumb-down"],["Informations ou exemple de code incorrects","incorrectInformationOrSampleCode","thumb-down"],["Il n'y a pas l'information/les exemples dont j'ai besoin","missingTheInformationSamplesINeed","thumb-down"],["Problème de traduction","translationIssue","thumb-down"],["Autre","otherDown","thumb-down"]],["Dernière mise à jour le 2025/09/04 (UTC)."],[[["\u003cp\u003eYou can utilize Colab Enterprise notebooks within BigQuery to explore and analyze data from your query results.\u003c/p\u003e\n"],["\u003cp\u003eThis tutorial guides you through querying a public BigQuery dataset and then exploring its results within a notebook environment.\u003c/p\u003e\n"],["\u003cp\u003eBefore you can create and run notebooks, you must have a Google Cloud project with billing enabled, along with enabling the BigQuery API and BigQuery Studio, as well as having the necessary IAM roles.\u003c/p\u003e\n"],["\u003cp\u003eAfter running a SQL query, the data can be further explored by loading it into a BigQuery DataFrame, running descriptive statistics and implementing other python packages.\u003c/p\u003e\n"],["\u003cp\u003eWhen you are finished with this tutorial you can delete the project to avoid any further costs, or you can keep the project but delete the resources within it.\u003c/p\u003e\n"]]],[],null,["# Explore query results in notebooks\n\n*** ** * ** ***\n\nYou can explore BigQuery query results by using\n[Colab Enterprise notebooks](/colab/docs/introduction) in\nBigQuery.\n\nIn this tutorial, you query data from a\n[BigQuery public dataset](/bigquery/public-data)\nand explore the query results in a notebook.\n\nObjectives\n----------\n\n- Create and run a query in BigQuery.\n- Explore query results in a notebook.\n\nCosts\n-----\n\nThis tutorial uses a dataset available through the\n[Google Cloud Public Datasets Program](https://cloud.google.com/blog/products/data-analytics/big-data-analytics-in-the-cloud-with-free-public-datasets).\nGoogle pays for the storage of these datasets and provides public access to the\ndata. You incur charges for the queries that you perform on the data. For\nmore information, see\n[BigQuery pricing](/bigquery/pricing).\n\nBefore you begin\n----------------\n\n1. In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n2.\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n3.\n\n\n Enable the BigQuery API.\n\n\n [Enable the API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery)\n\n For new projects, BigQuery is automatically enabled.\n\nSet the default region for code assets\n--------------------------------------\n\nIf this is the first time you are creating a code asset, you should set the\ndefault region for code assets. You can't change the region for a code asset\nafter it is created.\n| **Note:** If you create a code asset and choose a different default region than the one you have been using for code assets---for example, choosing `us-west1` when you have been using `us-central1`---then that code asset and all code assets you create afterwards use that new region by default. Existing code assets continue to use the region they were assigned when they were created.\n\nAll code assets in BigQuery Studio use the same default region.\nTo set the default region for code assets, follow these steps:\n\n1. Go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the **Explorer** pane, find the project in which you have enabled code\n assets.\n\n3. Click more_vert\n **View actions** next to the project, and then click\n **Change my default code region**.\n\n4. For **Region**, select the region that you want to use for code assets.\n\n5. Click **Select**.\n\nFor a list of supported regions, see [BigQuery Studio locations](/bigquery/docs/locations#bqstudio-loc). \n\n### Required permissions\n\nTo create and run notebooks, you need the following Identity and Access Management (IAM)\nroles:\n\n- [BigQuery User (`roles/bigquery.user`)](/bigquery/docs/access-control#bigquery.user)\n- [Notebook Runtime User (`roles/aiplatform.notebookRuntimeUser`)](/vertex-ai/docs/general/access-control#aiplatform.notebookRuntimeUser)\n- [Code Creator (`roles/dataform.codeCreator`)](/dataform/docs/access-control#dataform.codeCreator)\n\nOpen query results in a notebook\n--------------------------------\n\nYou can run a SQL query and then use a notebook to explore the data. This\napproach is useful if you want to modify the data in BigQuery\nbefore working with it, or if you need only a subset of the fields in the table.\n\n1. In the Google Cloud console, go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the **Type to search** field, enter `bigquery-public-data`.\n\n If the project is not shown, enter `bigquery` in the search field, and then\n click **Search to all projects** to match the search string with the\n existing projects.\n3. Select **bigquery-public-data \\\u003e ml_datasets \\\u003e penguins**.\n\n4. For the **penguins** table,\n click more_vert **View actions** ,\n and then click **Query**.\n\n5. Add an asterisk (`*`) for field selection to the generated query, so that\n it reads like the following example:\n\n ```googlesql\n SELECT * FROM `bigquery-public-data.ml_datasets.penguins` LIMIT 1000;\n ```\n6. Click play_circle **Run**.\n\n7. In the **Query results** section, click **Explore data** , and then click\n **Explore with Python notebook**.\n\nPrepare the notebook for use\n----------------------------\n\nPrepare the notebook for use by connecting to a runtime and setting application\ndefault values.\n\n1. In the notebook header, click **Connect** to [connect to the default runtime](/bigquery/docs/create-notebooks#connect_to_the_default_runtime).\n2. In the **Setup** code block, click play_circle **Run cell**.\n\nExplore the data\n----------------\n\n1. To load the **penguins** data into a [BigQuery DataFrame](/bigquery/docs/reference/bigquery-dataframes) and show the results, click play_circle **Run cell** in the code block in the **Result set loaded from BigQuery job as a DataFrame** section.\n2. To get descriptive metrics for the data, click play_circle **Run cell** in the code block in the **Show descriptive statistics using describe()** section.\n3. Optional: Use other Python functions or packages to explore and analyze the data.\n\nThe following code sample shows using\n[`bigframes.pandas`](/bigquery/docs/use-bigquery-dataframes)\nto analyze data, and [`bigframes.ml`](/bigquery/docs/use-bigquery-dataframes#ml-capabilities)\nto create a linear regression model from **penguins** data in a\nBigQuery DataFrame: \n\n import bigframes.pandas as bpd\n\n # Load data from BigQuery\n query_or_table = \"bigquery-public-data.ml_datasets.penguins\"\n bq_df = bpd.read_gbq(query_or_table)\n\n # Inspect one of the columns (or series) of the DataFrame:\n bq_df[\"body_mass_g\"]\n\n # Compute the mean of this series:\n average_body_mass = bq_df[\"body_mass_g\"].mean()\n print(f\"average_body_mass: {average_body_mass}\")\n\n # Find the heaviest species using the groupby operation to calculate the\n # mean body_mass_g:\n (\n bq_df[\"body_mass_g\"]\n .groupby(by=bq_df[\"species\"])\n .mean()\n .sort_values(ascending=False)\n .head(10)\n )\n\n # Create the Linear Regression model\n from bigframes.ml.linear_model import LinearRegression\n\n # Filter down to the data we want to analyze\n adelie_data = bq_df[bq_df.species == \"Adelie Penguin (Pygoscelis adeliae)\"]\n\n # Drop the columns we don't care about\n adelie_data = adelie_data.drop(columns=[\"species\"])\n\n # Drop rows with nulls to get our training data\n training_data = adelie_data.dropna()\n\n # Pick feature columns and label column\n X = training_data[\n [\n \"island\",\n \"culmen_length_mm\",\n \"culmen_depth_mm\",\n \"flipper_length_mm\",\n \"sex\",\n ]\n ]\n y = training_data[[\"body_mass_g\"]]\n\n model = LinearRegression(fit_intercept=False)\n model.fit(X, y)\n model.score(X, y)\n\nClean up\n--------\n\n\nTo avoid incurring charges to your Google Cloud account for the resources used in this\ntutorial, either delete the project that contains the resources, or keep the project and\ndelete the individual resources.\n\nThe easiest way to eliminate billing is to delete the Google Cloud project\nthat you created for this tutorial.\n\n| **Caution** : Deleting a project has the following effects:\n|\n| - **Everything in the project is deleted.** If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.\n| - **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an `appspot.com` URL, delete selected resources inside the project instead of deleting the whole project.\n|\n|\n| If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects\n| can help you avoid exceeding project quota limits.\n1. In the Google Cloud console, go to the **Manage resources** page.\n\n [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)\n2. In the project list, select the project that you want to delete, and then click **Delete**.\n3. In the dialog, type the project ID, and then click **Shut down** to delete the project.\n\nWhat's next\n-----------\n\n- Learn more about [creating notebooks in BigQuery](/bigquery/docs/create-notebooks).\n- Learn more about [exploring data with BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes)."]]