pandas 버전 3.0 이상 및 PyArrow 버전 19.0 이상에서는 pandas.ArrowDtype(pyarrow.json_(pa.string())입니다. 그렇지 않으면 JSON 열이 pandas.ArrowDtype(db_dtypes.JSONArrowType())으로 노출됩니다. 이 기능은 미리보기 상태입니다.
다음 섹션에서는 BigQuery DataFrames에서 사용하는 특수 데이터 유형을 설명합니다.
JSON
BigQuery DataFrames 내에서 BigQuery JSON 형식(경량 표준)을 사용하는 열은 pandas.ArrowDtype로 표시됩니다. 정확한 기본 Arrow 유형은 라이브러리 버전에 따라 다릅니다. 이전 환경에서는 일반적으로 호환성을 위해 db_dtypes.JSONArrowType()를 사용합니다. 이는 pa.string() 주변의 경량 래퍼 역할을 하는 Arrow 확장 프로그램 유형입니다. 반면 최신 설정 (pandas 3.0 이상 및 PyArrow 19.0 이상)에서는 최신 pa.json_(pa.string()) 표현을 활용합니다.
timedelta
timedelta 유형은 BigQuery 기본 유형 시스템에 직접 상응하는 유형이 없습니다. 기간 데이터를 관리하기 위해 BigQuery DataFrames는 BigQuery 테이블에서 INT64 유형을 기본 저장소 형식으로 사용합니다. 계산 결과는 pandas 라이브러리로 실행되는 동등한 작업에서 예상되는 동작과 일치합니다.
다음 예와 같이 timedelta 값을 BigQuery DataFrame 및 Series 객체에 직접 로드할 수 있습니다.
importpandasaspdimportbigframes.pandasasbpds=pd.Series([pd.Timedelta("1s"),pd.Timedelta("2m")])bpd.read_pandas(s)# 0 0 days 00:00:01# 1 0 days 00:02:00# dtype: duration[us][pyarrow]
pandas와 달리 BigQuery DataFrames는 마이크로초 정밀도의 timedelta 값만 지원합니다. 데이터에 나노초가 포함된 경우 다음 예와 같이 잠재적인 예외를 방지하기 위해 반올림해야 합니다.
importpandasaspds=pd.Series([pd.Timedelta("999ns")])bpd.read_pandas(s.dt.round("us"))# 0 0 days 00:00:00.000001# dtype: duration[us][pyarrow]
다음 예와 같이 bigframes.pandas.to_timedelta 함수를 사용하여 BigQuery DataFrames Series 객체를 timedelta 유형으로 변환할 수 있습니다.
importbigframes.pandasasbpdbpd.to_timedelta([1,2,3],unit="s")# 0 0 days 00:00:01# 1 0 days 00:00:02# 2 0 days 00:00:03# dtype: duration[us][pyarrow]
timedelta 값이 포함된 데이터를 BigQuery 테이블에 로드하면 값이 마이크로초로 변환되어 INT64 열에 저장됩니다. 유형 정보를 유지하기 위해 BigQuery DataFrames는 이러한 열의 설명에 #microseconds 문자열을 추가합니다. SQL 쿼리 실행 및 UDF 호출과 같은 일부 작업은 열 설명을 유지하지 않으며 이러한 작업이 완료된 후 timedelta 유형 정보가 손실됩니다.
복합 유형 도구
특정 복합 유형의 경우 BigQuery DataFrames는 이러한 유형 내의 기본 값에 액세스하고 이를 처리할 수 있는 도구를 제공합니다.
목록 접근자
ListAccessor 객체를 사용하면 다음 예와 같이 Series 객체의 목록 속성을 사용하여 각 목록 요소에 작업을 실행할 수 있습니다.
importbigframes.pandasasbpds=bpd.Series([[1,2,3],[4,5],[6]])# dtype: list<item: int64>[pyarrow]# Access the first elements of each lists.list[0]# 0 1# 1 4# 2 6# dtype: Int64# Get the lengths of each lists.list.len()# 0 3# 1 2# 2 1# dtype: Int64
구조체 접근자
StructAccessor 객체는 일련의 구조체에서 필드에 액세스하고 필드를 처리할 수 있습니다. API 액세스 도구 객체는 다음 예와 같이 series.struct입니다.
importbigframes.pandasasbpdstructs=[{"id":101,"category":"A"},{"id":102,"category":"B"},{"id":103,"category":"C"},]s=bpd.Series(structs)# Get the 'id' field of each structs.struct.field("id")# 0 101# 1 102# 2 103# Name: id, dtype: Int64
액세스하려는 struct 필드가 다른 Series 속성과 명확하게 구분되는 경우 다음 예와 같이 struct 호출을 건너뛸 수 있습니다.
importbigframes.pandasasbpdstructs=[{"id":101,"category":"A"},{"id":102,"category":"B"},{"id":103,"category":"C"},]s=bpd.Series(structs)# not explicitly using the "struct" propertys.id# 0 101# 1 102# 2 103# Name: id, dtype: Int64
하지만 필드에 액세스할 때는 struct를 사용하는 것이 좋습니다. 코드를 더 쉽게 이해할 수 있고 오류가 발생할 가능성이 줄어들기 때문입니다.
문자열 접근자
다음 예와 같이 Series 객체의 str 속성을 사용하여 StringAccessor 객체에 액세스할 수 있습니다.
importbigframes.pandasasbpds=bpd.Series(["abc","de","1"])# dtype: string[pyarrow]# Get the first character of each strings.str[0]# 0 a# 1 d# 2 1# dtype: string# Check whether there are only alphabetic characters in each strings.str.isalpha()# 0 True# 1 True# 2 False# dtype: boolean# Cast the alphabetic characters to their upper cases for each strings.str.upper()# 0 ABC# 1 DE# 2 1# dtype: string
지리 접근자
BigQuery DataFrames는 GeoPandas 라이브러리에서 제공하는 GeoSeries 구조와 유사한 API를 공유하는 GeographyAccessor 객체를 제공합니다. 다음 예와 같이 Series 객체에서 geo 속성을 사용하여 GeographyAccessor 객체를 호출할 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Use the BigQuery DataFrames data type system\n============================================\n\nThe BigQuery DataFrames data type system is built upon BigQuery data\ntypes. This design ensures seamless integration and alignment with the\nGoogle Cloud data warehouse, reflecting the built-in types used for data\nstorage in BigQuery.\n\nType mappings\n-------------\n\nThe following table shows data type equivalents in BigQuery,\nBigQuery DataFrames, and other Python libraries as well as their levels\nof support:\n\n| **Note:** BigQuery DataFrames doesn't support the following BigQuery data types: `INTERVAL` and `RANGE`. All other BigQuery data types display as the object type.\n\n### Type conversions\n\nWhen used with local data, BigQuery DataFrames converts data types to\ntheir corresponding BigQuery DataFrames equivalents wherever a\n[type mapping is defined](#type-mappings), as shown in the following example: \n\n import pandas as pd\n\n import bigframes.pandas as bpd\n\n s = pd.Series([pd.Timestamp(\"20250101\")])\n assert s.dtype == \"datetime64[ns]\"\n assert bpd.read_pandas(s).dtype == \"timestamp[us][pyarrow]\"\n\nPyArrow dictates behavior when there are discrepancies between the data type\nequivalents. In rare cases when the Python built-in type functions differently\nfrom its PyArrow counterpart, BigQuery DataFrames generally favors the\nPyArrow behavior to ensure consistency.\n\nThe following code sample uses the `datetime.date + timedelta` operation to\nshow that, unlike the Python datetime library that still returns a date\ninstance, BigQuery DataFrames follows the PyArrow behavior by returning\na timestamp instance: \n\n import datetime\n\n import pandas as pd\n\n import bigframes.pandas as bpd\n\n s = pd.Series([datetime.date(2025, 1, 1)])\n s + pd.Timedelta(hours=12)\n # 0\t2025-01-01\n # dtype: object\n\n bpd.read_pandas(s) + pd.Timedelta(hours=12)\n # 0 2025-01-01 12:00:00\n # dtype: timestamp[us][pyarrow]\n\nSpecial types\n-------------\n\nThe following sections describe the special data types that\nBigQuery DataFrames uses.\n\n### JSON\n\nWithin BigQuery DataFrames, columns using the BigQuery\n[JSON format](/bigquery/docs/reference/standard-sql/data-types#json_type)\n(a lightweight standard) are represented by `pandas.ArrowDtype`. The exact\nunderlying Arrow type depends on your library versions. Older environments\ntypically use `db_dtypes.JSONArrowType()` for compatibility, which is an Arrow\nextension type that acts as a light wrapper around `pa.string()`. In contrast,\nnewer setups (pandas 3.0 and later and PyArrow 19.0 and later) utilize the more\nrecent `pa.json_(pa.string())` representation.\n\n### `timedelta`\n\nThe `timedelta` type lacks a direct equivalent within the\nBigQuery native type system. To manage duration data,\nBigQuery DataFrames utilizes the `INT64` type as the underlying storage\nformat in BigQuery tables. You can expect the results of your\ncomputations to be consistent with the behavior you would expect from\nequivalent operations performed with the pandas library.\n\nYou can directly load `timedelta` values into BigQuery DataFrames and\n`Series` objects, as shown in the following example: \n\n import pandas as pd\n\n import bigframes.pandas as bpd\n\n s = pd.Series([pd.Timedelta(\"1s\"), pd.Timedelta(\"2m\")])\n bpd.read_pandas(s)\n # 0 0 days 00:00:01\n # 1 0 days 00:02:00\n # dtype: duration[us][pyarrow]\n\nUnlike pandas, BigQuery DataFrames only supports `timedelta` values with\nmicrosecond precision. If your data includes nanoseconds, you must round them to\navoid potential exceptions, as shown in the following example: \n\n import pandas as pd\n\n s = pd.Series([pd.Timedelta(\"999ns\")])\n bpd.read_pandas(s.dt.round(\"us\"))\n # 0 0 days 00:00:00.000001\n # dtype: duration[us][pyarrow]\n\nYou can use the `bigframes.pandas.to_timedelta` function to cast a\nBigQuery DataFrames `Series` object to the `timedelta` type, as shown\nin the following example: \n\n import bigframes.pandas as bpd\n\n bpd.to_timedelta([1, 2, 3], unit=\"s\")\n # 0 0 days 00:00:01\n # 1 0 days 00:00:02\n # 2 0 days 00:00:03\n # dtype: duration[us][pyarrow]\n\nWhen you load data containing `timedelta` values to a BigQuery table, the\nvalues are converted to microseconds and stored in `INT64` columns. To\npreserve the type information, BigQuery DataFrames appends the\n`#microseconds` string to the descriptions of these columns. Some operations,\nsuch as SQL query executions and UDF invocations, don't preserve column\ndescriptions, and the `timedelta` type information is lost after these\noperations are completed.\n\nTools for composite types\n-------------------------\n\nFor certain composite types, BigQuery DataFrames provides tools that\nlet you access and process the elemental values within those types.\n\n### List accessor\n\nThe `ListAccessor` object can help you perform operations on each list element\nby using the list property of the `Series` object, as shown in the\nfollowing example: \n\n import bigframes.pandas as bpd\n\n s = bpd.Series([[1, 2, 3], [4, 5], [6]]) # dtype: list\u003citem: int64\u003e[pyarrow]\n\n # Access the first elements of each list\n s.list[0]\n # 0 1\n # 1 4\n # 2 6\n # dtype: Int64\n\n # Get the lengths of each list\n s.list.len()\n # 0 3\n # 1 2\n # 2 1\n # dtype: Int64\n\n### Struct accessor\n\nThe `StructAccessor` object can access and process fields in a series of\nstructs. The API accessor object is `series.struct`, as shown in the\nfollowing example: \n\n import bigframes.pandas as bpd\n\n structs = [\n {\"id\": 101, \"category\": \"A\"},\n {\"id\": 102, \"category\": \"B\"},\n {\"id\": 103, \"category\": \"C\"},\n ]\n s = bpd.Series(structs)\n # Get the 'id' field of each struct\n s.struct.field(\"id\")\n # 0 101\n # 1 102\n # 2 103\n # Name: id, dtype: Int64\n\nIf the `struct` field you plan to access is unambiguous from other `Series`\nproperties, you can skip calling `struct`, as shown in the following example: \n\n import bigframes.pandas as bpd\n\n structs = [\n {\"id\": 101, \"category\": \"A\"},\n {\"id\": 102, \"category\": \"B\"},\n {\"id\": 103, \"category\": \"C\"},\n ]\n s = bpd.Series(structs)\n\n # not explicitly using the \"struct\" property\n s.id\n # 0 101\n # 1 102\n # 2 103\n # Name: id, dtype: Int64\n\nHowever, it's a best practice to use `struct` for accessing fields, because\nit makes your code easier to understand and less error-prone.\n\n### String accessor\n\nYou can access the `StringAccessor` object with the `str` property on a `Series`\nobject, as shown in the following example: \n\n import bigframes.pandas as bpd\n\n s = bpd.Series([\"abc\", \"de\", \"1\"]) # dtype: string[pyarrow]\n\n # Get the first character of each string\n s.str[0]\n # 0 a\n # 1 d\n # 2 1\n # dtype: string\n\n # Check whether there are only alphabetic characters in each string\n s.str.isalpha()\n # 0 True\n # 1 True\n # 2 False\n # dtype: boolean\n\n # Cast the alphabetic characters to their upper cases for each string\n s.str.upper()\n # 0 ABC\n # 1 DE\n # 2 1\n # dtype: string\n\n### Geography accessor\n\nBigQuery DataFrames provides a `GeographyAccessor` object that shares\nsimilar APIs with the GeoSeries structure provided by the GeoPandas library. You\ncan invoke the `GeographyAccessor` object with the `geo` property on a `Series`\nobject, as shown in the following example: \n\n from shapely.geometry import Point\n\n import bigframes.pandas as bpd\n\n s = bpd.Series([Point(1, 0), Point(2, 1)]) # dtype: geometry\n\n s.geo.y\n # 0 0.0\n # 1 1.0\n # dtype: Float64\n\nWhat's next\n-----------\n\n- Learn how to [use BigQuery DataFrames](/bigquery/docs/use-bigquery-dataframes).\n- Learn about [BigQuery DataFrames sessions and I/O](/bigquery/docs/dataframes-sessions-io).\n- Learn how to [visualize graphs using BigQuery DataFrames](/bigquery/docs/dataframes-visualizations).\n- Explore the [BigQuery DataFrames API reference](/python/docs/reference/bigframes/latest/summary_overview)."]]