このページは Cloud Translation API によって翻訳されました。

BigQuery DataFrames を試す

このクイックスタートでは、BigQuery ノートブックで BigQuery DataFrames API を使用して、次の分析タスクと ML タスクを行います。

bigquery-public-data.ml_datasets.penguins 一般公開データセット上に DataFrame を作成する。
ペンギンの平均体重を計算する。
線形回帰モデルを作成する。
トレーニングデータとして使用するペンギンデータのサブセット上で DataFrame を作成する。
トレーニングデータをクリーンアップする。
モデルパラメータを設定する。
モデルに適合させる。
モデルのスコア付けを行う。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

BigQuery API が有効になっていることを確認します。

API を有効にする

新しいプロジェクトを作成している場合は、BigQuery API が自動的に有効になっています。

必要な権限

ノートブックを作成して実行するには、次の Identity and Access Management（IAM）ロールが必要です。

ノートブックを作成する

BigQuery エディタからノートブックを作成するの手順に沿って、新しいノートブックを作成します。

BigQuery DataFrames を試す

次の手順で BigQuery DataFrame をお試しください。

ノートブックに新しいコードセルを作成します。

コードセルに次のコードを追加します。

import bigframes.pandas as bpd

# Set BigQuery DataFrames options
# Note: The project option is not required in all environments.
# On BigQuery Studio, the project ID is automatically detected.
bpd.options.bigquery.project = your_gcp_project_id

# Use "partial" ordering mode to generate more efficient queries, but the
# order of the rows in DataFrames may not be deterministic if you have not
# explictly sorted it. Some operations that depend on the order, such as
# head() will not function until you explictly order the DataFrame. Set the
# ordering mode to "strict" (default) for more pandas compatibility.
bpd.options.bigquery.ordering_mode = "partial"

# Create a DataFrame from a BigQuery table
query_or_table = "bigquery-public-data.ml_datasets.penguins"
df = bpd.read_gbq(query_or_table)

# Efficiently preview the results using the .peek() method.
df.peek()

bpd.options.bigquery.project = your_gcp_project_id 行を変更して、 Google Cloud プロジェクト ID を指定します。例: bpd.options.bigquery.project = "myProjectID"
コードセルを実行します。

このコードは、ペンギンに関するデータを含む DataFrame オブジェクトを返します。

ノートブックに新しいコードセルを作成し、次のコードを追加します。

# Use the DataFrame just as you would a pandas DataFrame, but calculations
# happen in the BigQuery query engine instead of the local system.
average_body_mass = df["body_mass_g"].mean()
print(f"average_body_mass: {average_body_mass}")

コードセルを実行します。

このコードは、ペンギンの平均体重を計算し、Google Cloud コンソールに出力します。

ノートブックに新しいコードセルを作成し、次のコードを追加します。

# Create the Linear Regression model
from bigframes.ml.linear_model import LinearRegression

# Filter down to the data we want to analyze
adelie_data = df[df.species == "Adelie Penguin (Pygoscelis adeliae)"]

# Drop the columns we don't care about
adelie_data = adelie_data.drop(columns=["species"])

# Drop rows with nulls to get our training data
training_data = adelie_data.dropna()

# Pick feature columns and label column
X = training_data[
    [
        "island",
        "culmen_length_mm",
        "culmen_depth_mm",
        "flipper_length_mm",
        "sex",
    ]
]
y = training_data[["body_mass_g"]]

model = LinearRegression(fit_intercept=False)
model.fit(X, y)
model.score(X, y)

コードセルを実行します。

コードはモデルの評価指標を返します。

クリーンアップ

課金をなくす最も簡単な方法は、チュートリアル用に作成したプロジェクトを削除することです。

プロジェクトを削除するには:

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限の超過を回避できます。

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

次のステップ

BigQuery DataFrames の使用方法について引き続き学習する。
BigQuery DataFrames を使用してグラフを可視化する方法を確認する。
BigQuery DataFrames ノートブックを使用する方法を確認する。