이 섹션에서는 SQL을 노트북 셀에 직접 쓰고 BigQuery에서 Python 노트북으로 데이터를 읽습니다.
단일 또는 이중 백분율 문자(% 또는 %%)를 사용하는 매직 명령어를 사용하면 최소한의 구문을 사용하여 노트북 내에서 BigQuery와 상호작용할 수 있습니다. Python용 BigQuery 클라이언트 라이브러리는 관리형 노트북 인스턴스에 자동으로 설치됩니다. 백그라운드에서 %%bigquery 매직 명령어가 Python용 BigQuery 클라이언트 라이브러리를 사용하여 지정된 쿼리를 실행하고, 결과를 Pandas DataFrame으로 변환하고, 선택한 경우 결과를 변수에 저장한 후 결과를 표시합니다.
참고: google-cloud-bigquery Python 패키지 버전 1.26.0부터 기본적으로 BigQuery Storage API가 %%bigquery 매직에서 결과를 다운로드하는 데 사용됩니다.
노트북 파일을 열려면 파일 > 새로 만들기 > 노트북을 선택합니다.
커널 선택 대화상자에서 Python(로컬)을 선택한 다음 선택을 클릭합니다.
새 IPYNB 파일이 열립니다.
international_top_terms 데이터 세트에서 국가별 리전 수를 확인하려면 다음 문을 입력합니다.
Query complete after 0.07s: 100%|██████████| 4/4 [00:00<00:00, 1440.60query/s]
Downloading: 100%|██████████| 41/41 [00:02<00:00, 20.21rows/s]
country_code country_name num_regions
0 TR Turkey 81
1 TH Thailand 77
2 VN Vietnam 63
3 JP Japan 47
4 RO Romania 42
5 NG Nigeria 37
6 IN India 36
7 ID Indonesia 34
8 CO Colombia 33
9 MX Mexico 32
10 BR Brazil 27
11 EG Egypt 27
12 UA Ukraine 27
13 CH Switzerland 26
14 AR Argentina 24
15 FR France 22
16 SE Sweden 21
17 HU Hungary 20
18 IT Italy 20
19 PT Portugal 20
20 NO Norway 19
21 FI Finland 18
22 NZ New Zealand 17
23 PH Philippines 17
...
다음 셀(이전 셀의 출력 아래)에 다음 명령어를 입력하여 동일한 쿼리를 실행합니다. 단, 이번에는 결과를 이름이 regions_by_country인 새 Pandas DataFrame에 저장합니다. %%bigquery 매직 명령어와 함께 인수를 사용하여 해당 이름을 제공합니다.
Pandas DataFrame regions_by_country를 표시할 수 있게 되었습니다.
BigQuery 클라이언트 라이브러리를 직접 사용하여 데이터 쿼리
이 섹션에서는 Python용 BigQuery 클라이언트 라이브러리를 직접 사용하여 데이터를 Python 노트북으로 읽습니다.
클라이언트 라이브러리를 사용하면 쿼리를 보다 세부적으로 제어할 수 있고 쿼리 및 작업을 위한 더욱 복잡한 구성도 가능해집니다. Pandas와 함께 라이브러리의 통합을 사용하면 선언형 SQL의 기능을 명령형 코드(Python)와 결합하여 데이터를 분석, 시각화, 변환할 수 있습니다.
참고: 다양한 Python 데이터 분석, 데이터 랭글링, 시각화 라이브러리(예: numpy, pandas, matplotlib) 등을 사용할 수 있습니다. 이 중 일부 라이브러리는 DataFrame 객체를 기반으로 합니다.
다음 셀에서 다음 Python 코드를 입력하여 Python용 BigQuery 클라이언트 라이브러리를 가져오고 클라이언트를 초기화합니다.
다음 셀에서 다음 코드를 입력하여 시간이 지나면서 날짜별로 겹치는 미국 top_terms의 일일 상위 검색어 비율을 가져옵니다. 각 날짜의 상위 검색어를 살펴보고 전날, 2일 전, 3일 전 등(한 달간의 날짜 쌍)의 상위 검색어와 겹치는 비율을 확인할 계획입니다.
sql="""WITH TopTermsByDate AS ( SELECT DISTINCT refresh_date AS date, term FROM `bigquery-public-data.google_trends.top_terms` ), DistinctDates AS ( SELECT DISTINCT date FROM TopTermsByDate )SELECT DATE_DIFF(Dates2.date, Date1Terms.date, DAY) AS days_apart, COUNT(DISTINCT (Dates2.date || Date1Terms.date)) AS num_date_pairs, COUNT(Date1Terms.term) AS num_date1_terms, SUM(IF(Date2Terms.term IS NOT NULL, 1, 0)) AS overlap_terms, SAFE_DIVIDE( SUM(IF(Date2Terms.term IS NOT NULL, 1, 0)), COUNT(Date1Terms.term) ) AS pct_overlap_termsFROM TopTermsByDate AS Date1TermsCROSS JOIN DistinctDates AS Dates2LEFT JOIN TopTermsByDate AS Date2Terms ON Dates2.date = Date2Terms.date AND Date1Terms.term = Date2Terms.termWHERE Date1Terms.date <= Dates2.dateGROUP BY days_apartORDER BY days_apart;"""pct_overlap_terms_by_days_apart=client.query(sql).to_dataframe()pct_overlap_terms_by_days_apart.head()
사용되는 SQL은 Python 문자열로 캡슐화된 후 query() 메서드로 전달되어 쿼리를 실행합니다. to_dataframe 메서드는 쿼리가 완료될 때까지 기다린 후 BigQuery Storage API를 사용하여 결과를 Pandas DataFrame에 다운로드합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Explore and visualize data in BigQuery from within JupyterLab\n=============================================================\n\n\n| Vertex AI Workbench managed notebooks is\n| [deprecated](/vertex-ai/docs/deprecations). On\n| April 14, 2025, support for\n| managed notebooks will end and the ability to create managed notebooks instances\n| will be removed. Existing instances will continue to function\n| but patches, updates, and upgrades won't be available. To continue using\n| Vertex AI Workbench, we recommend that you\n| [migrate\n| your managed notebooks instances to Vertex AI Workbench instances](/vertex-ai/docs/workbench/managed/migrate-to-instances).\n\n\u003cbr /\u003e\n\nThis page shows you some examples of how to explore and visualize data\nthat is stored in BigQuery from within the JupyterLab interface\nof your Vertex AI Workbench managed notebooks instance.\n\nBefore you begin\n----------------\n\nIf you haven't already,\n[create\na managed notebooks instance](/vertex-ai/docs/workbench/managed/create-instance#create).\n\nOpen JupyterLab\n---------------\n\n1. In the Google Cloud console, go to the **Managed notebooks** page.\n\n [Go to Managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/managed)\n2. Next to your managed notebooks instance's name,\n click **Open JupyterLab**.\n\n Your managed notebooks instance opens JupyterLab.\n\nRead data from BigQuery\n-----------------------\n\nIn the next two sections, you read data from BigQuery\nthat you will use to visualize later. These steps are identical to those\nin [Query data in BigQuery from\nwithin JupyterLab](/vertex-ai/docs/workbench/managed/bigquery), so if you've completed\nthem already, you can skip to\n[Get a summary of data in a BigQuery table](#summary).\n\n### Query data by using the %%bigquery magic command\n\nIn this section, you write SQL directly in notebook cells and read data from\nBigQuery into the Python notebook.\n\nMagic commands that use a single or double percentage character (`%` or `%%`)\nlet you use minimal syntax to interact with BigQuery within the\nnotebook. The BigQuery client library for Python is automatically\ninstalled in a managed notebooks instance. Behind the scenes, the `%%bigquery` magic\ncommand uses the BigQuery client library for Python to run the\ngiven query, convert the results to a pandas DataFrame, optionally save the\nresults to a variable, and then display the results.\n\n**Note** : As of version 1.26.0 of the `google-cloud-bigquery` Python package,\nthe [BigQuery Storage API](/bigquery/docs/reference/storage)\nis used by default to download results from the `%%bigquery` magics.\n\n1. To open a notebook file, select **File \\\u003e New \\\u003e\n Notebook**.\n\n \u003cbr /\u003e\n\n2. In the **Select Kernel** dialog, select **Python (Local)** , and then click\n **Select**.\n\n Your new IPYNB file opens.\n3. To get the number of regions by country in the `international_top_terms`\n dataset, enter the following statement:\n\n ```sql\n %%bigquery\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code,\n country_name\n ORDER BY\n num_regions DESC;\n ```\n4. Click play_circle_filled **Run cell**.\n\n The output is similar to the following: \n\n ```\n Query complete after 0.07s: 100%|██████████| 4/4 [00:00\u003c00:00, 1440.60query/s]\n Downloading: 100%|██████████| 41/41 [00:02\u003c00:00, 20.21rows/s]\n country_code country_name num_regions\n 0 TR Turkey 81\n 1 TH Thailand 77\n 2 VN Vietnam 63\n 3 JP Japan 47\n 4 RO Romania 42\n 5 NG Nigeria 37\n 6 IN India 36\n 7 ID Indonesia 34\n 8 CO Colombia 33\n 9 MX Mexico 32\n 10 BR Brazil 27\n 11 EG Egypt 27\n 12 UA Ukraine 27\n 13 CH Switzerland 26\n 14 AR Argentina 24\n 15 FR France 22\n 16 SE Sweden 21\n 17 HU Hungary 20\n 18 IT Italy 20\n 19 PT Portugal 20\n 20 NO Norway 19\n 21 FI Finland 18\n 22 NZ New Zealand 17\n 23 PH Philippines 17\n ...\n ```\n | **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n5. In the next cell (below the output from the previous cell), enter the\n following command to run the same query, but this time save the results to\n a new pandas DataFrame that's named `regions_by_country`. You provide that\n name by using an argument with the `%%bigquery` magic command.\n\n ```sql\n %%bigquery regions_by_country\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code, country_name\n ORDER BY\n num_regions DESC;\n ```\n\n **Note:** For more information about available arguments for the\n `%%bigquery` command, see the [client library magics documentation](/python/docs/reference/bigquery/latest/magics).\n6. Click play_circle_filled **Run cell**.\n\n7. In the next cell, enter the following command to look at the first few\n rows of the query results that you just read in:\n\n regions_by_country.head()\n\n8. Click play_circle_filled **Run cell**.\n\n The pandas DataFrame `regions_by_country` is ready to plot.\n\n### Query data by using the BigQuery client library directly\n\n\\\u003c\n\nGet a summary of data in a BigQuery table\n-----------------------------------------\n\n\nIn this section, you use a notebook shortcut to get summary statistics and\nvisualizations for all fields of a BigQuery table. This can\nbe a fast way to profile your data before exploring further.\n\nThe BigQuery client library provides a magic command,\n`%bigquery_stats`, that you can call with a specific table name to provide an\noverview of the table and detailed statistics on each of the table's\ncolumns.\n\n1. In the next cell, enter the following code to run that analysis on the US\n [`top_terms` table](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=google_trends&t=top_terms&page=table):\n\n %bigquery_stats bigquery-public-data.google_trends.top_terms\n\n2. Click play_circle_filled **Run cell**.\n\n After running for some time, an image appears with various statistics on\n each of the 7 variables in the `top_terms` table. The following image shows\n part of some example output:\n\n| **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n\nVisualize BigQuery data\n-----------------------\n\nIn this section, you use plotting capabilities to visualize the results from\nthe queries that you previously ran in your Jupyter notebook.\n\n1. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a bar chart that visualizes the results\n of the query that returns the number of regions by country:\n\n regions_by_country.plot(kind=\"bar\", x=\"country_name\", y=\"num_regions\", figsize=(15, 10))\n\n2. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following:\n\n3. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a scatter plot that visualizes the\n results from the query for the percentage of overlap in the top search terms\n by days apart:\n\n pct_overlap_terms_by_days_apart.plot(\n kind=\"scatter\",\n x=\"days_apart\",\n y=\"pct_overlap_terms\",\n s=len(pct_overlap_terms_by_days_apart[\"num_date_pairs\"]) * 20,\n figsize=(15, 10)\n )\n\n4. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following. The size of each point reflects\n the number of date pairs that are that many days apart in the data. For\n example, there are more pairs that are 1 day apart than 30 days apart\n because the top search terms are surfaced daily over about a month's time.\n\nFor more information about data visualization, see the\n[pandas documentation](https://pandas.pydata.org/pandas-docs/stable/visualization.html)."]]