- 2.25.0 (latest)
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
API documentation for bigquery package.
Packages Functions
approx_top_count
approx_top_count(
series: bigframes.series.Series, number: int
) -> bigframes.series.SeriesReturns the approximate top elements of expression as an array of STRUCTs.
The number parameter specifies the number of elements returned.
Each STRUCT contains two fields. The first field (named value) contains an input
value. The second field (named count) contains an INT64 specifying the number
of times the value was returned.
Returns NULL if there are zero input rows.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(["apple", "apple", "pear", "pear", "pear", "banana"])
>>> bbq.approx_top_count(s, number=2)
[{'value': 'pear', 'count': 3}, {'value': 'apple', 'count': 2}]
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.series.SeriesThe Series with any data type that the |
number |
intAn integer specifying the number of times the value was returned. |
array_agg
array_agg(
obj: groupby.SeriesGroupBy | groupby.DataFrameGroupBy,
) -> series.Series | dataframe.DataFrameGroup data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
For a SeriesGroupBy object:
>>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a [1. 2.]
b [3. 4.]
dtype: list<item: double>[pyarrow]
For a DataFrameGroupBy object:
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
a c
b
1.0 [2] [3]
2.0 [1 1] [3 2]
<BLANKLINE>
[2 rows x 2 columns]
| Parameter | |
|---|---|
| Name | Description |
obj |
groupby.SeriesGroupBy groupby.DataFrameGroupByA GroupBy object to be applied the function. |
array_length
array_length(series: bigframes.series.Series) -> bigframes.series.SeriesCompute the length of each array element in the Series.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0 4
1 0
2 2
dtype: Int64
You can also apply this function directly to Series.
>>> s.apply(bbq.array_length, by_row=False)
0 4
1 0
2 2
dtype: Int64
| Parameter | |
|---|---|
| Name | Description |
series |
bigframes.series.SeriesA Series with array columns. |
array_to_string
array_to_string(
series: bigframes.series.Series, delimiter: str
) -> bigframes.series.SeriesConverts array elements within a Series into delimited strings.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
0 H, i, !
1 Hello, World
2
3
4 Hi
dtype: string
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.series.SeriesA Series containing arrays. |
delimiter |
strThe string used to separate array elements. |
create_vector_index
create_vector_index(
table_id: str,
column_name: str,
*,
replace: bool = False,
index_name: Optional[str] = None,
distance_type="cosine",
stored_column_names: Collection[str] = (),
index_type: str = "ivf",
ivf_options: Optional[Mapping] = None,
tree_ah_options: Optional[Mapping] = None,
session: Optional[bigframes.session.Session] = None
) -> NoneCreates a new vector index on a column of a table.
This method calls the CREATE VECTOR INDEX DDL statement
<https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_vector_index_statement>_.
json_extract
json_extract(
input: bigframes.series.Series, json_path: str
) -> bigframes.series.SeriesExtracts a JSON value and converts it to a SQL JSON-formatted STRING or
JSON value. This function uses single quotes and brackets to escape invalid
JSONPath characters in JSON keys.
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_extract(s, json_path="$.class")
0 {"students":[{"id":5},{"id":12}]}
dtype: string
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
json_extract_array
json_extract_array(
input: bigframes.series.Series, json_path: str = "$"
) -> bigframes.series.SeriesExtracts a JSON array and converts it to a SQL array of JSON-formatted
STRING or JSON values. This function uses single quotes and brackets to
escape invalid JSONPath characters in JSON keys.
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_array(s)
0 ['1' '2' '3']
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
... '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_extract_array(s, "$.fruits")
0 ['{"name":"apple"}' '{"name":"cherry"}']
1 ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_array(s, "$.fruits.names")
0 ['"apple"' '"cherry"']
1 ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
json_extract_string_array
json_extract_string_array(
input: bigframes.series.Series,
json_path: str = "$",
value_dtype: typing.Optional[
typing.Union[
pandas.core.arrays.boolean.BooleanDtype,
pandas.core.arrays.floating.Float64Dtype,
pandas.core.arrays.integer.Int64Dtype,
pandas.core.arrays.string_.StringDtype,
pandas.core.dtypes.dtypes.ArrowDtype,
geopandas.array.GeometryDtype,
typing.Literal[
"boolean",
"Float64",
"Int64",
"int64[pyarrow]",
"string",
"string[pyarrow]",
"timestamp[us, tz=UTC][pyarrow]",
"timestamp[us][pyarrow]",
"date32[day][pyarrow]",
"time64[us][pyarrow]",
"decimal128(38, 9)[pyarrow]",
"decimal256(76, 38)[pyarrow]",
"binary[pyarrow]",
"duration[us][pyarrow]",
],
]
] = None,
) -> bigframes.series.SeriesExtracts a JSON array and converts it to a SQL array of STRING values.
A value_dtype can be provided to further coerce the data type of the
values in the array. This function uses single quotes and brackets to escape
invalid JSONPath characters in JSON keys.
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_string_array(s)
0 ['1' '2' '3']
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> bbq.json_extract_string_array(s, value_dtype='Int64')
0 [1 2 3]
1 [4 5]
dtype: list<item: int64>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_string_array(s, "$.fruits.names")
0 ['apple' 'cherry']
1 ['guava' 'grapes']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
value_dtype |
dtype, OptionalThe data type supported by BigFrames DataFrame. |
json_query
json_query(
input: bigframes.series.Series, json_path: str
) -> bigframes.series.SeriesExtracts a JSON value and converts it to a SQL JSON-formatted STRING
or JSON value. This function uses double quotes to escape invalid JSONPath
characters in JSON keys. For example: "a.b".
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_query(s, json_path="$.class")
0 {"students":[{"id":5},{"id":12}]}
dtype: string
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
json_query_array
json_query_array(
input: bigframes.series.Series, json_path: str = "$"
) -> bigframes.series.SeriesExtracts a JSON array and converts it to a SQL array of JSON-formatted
STRING or JSON values. This function uses double quotes to escape invalid
JSONPath characters in JSON keys. For example: "a.b".
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_query_array(s)
0 ['1' '2' '3']
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
... '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_query_array(s, "$.fruits")
0 ['{"name":"apple"}' '{"name":"cherry"}']
1 ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_query_array(s, "$.fruits.names")
0 ['"apple"' '"cherry"']
1 ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
json_set
json_set(
input: bigframes.series.Series,
json_path_value_pairs: typing.Sequence[typing.Tuple[str, typing.Any]],
) -> bigframes.series.SeriesProduces a new JSON value within a Series by inserting or replacing values at specified paths.
Examples:>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
0 {"a":100,"b":"hi"}
Name: data, dtype: extension<dbjson<JSONArrowType>>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path_value_pairs |
Sequence[Tuple[str, Any]]Pairs of JSON path and the new value to insert/replace. |
json_value
json_value(
input: bigframes.series.Series, json_path: str = "$"
) -> bigframes.series.SeriesExtracts a JSON scalar value and converts it to a SQL STRING value. In
addtion, this function:
- Removes the outermost quotes and unescapes the values.
- Returns a SQL
NULLif a non-scalar value is selected. - Uses double quotes to escape invalid
JSON_PATHcharacters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"name": "Jakob", "age": "6"}', '{"name": "Jakob", "age": []}'])
>>> bbq.json_value(s, json_path="$.age")
0 6
1 <NA>
dtype: string
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
json_value_array
json_value_array(
input: bigframes.series.Series, json_path: str = "$"
) -> bigframes.series.SeriesExtracts a JSON array of scalar values and converts it to a SQL ARRAY<STRING>
value. In addition, this function:
- Removes the outermost quotes and unescapes the values.
- Returns a SQL
NULLif the selected value isn't an array or not an array containing only scalar values. - Uses double quotes to escape invalid
JSON_PATHcharacters in JSON keys.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_value_array(s)
0 ['1' '2' '3']
1 ['4' '5']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": ["apples", "oranges", "grapes"]',
... '{"fruits": ["guava", "grapes"]}'
... ])
>>> bbq.json_value_array(s, "$.fruits")
0 ['apples' 'oranges' 'grapes']
1 ['guava' 'grapes']
dtype: list<item: string>[pyarrow]
>>> s = bpd.Series([
... '{"fruits": {"color": "red", "names": ["apple","cherry"]}}',
... '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_value_array(s, "$.fruits.names")
0 ['apple' 'cherry']
1 ['guava' 'grapes']
dtype: list<item: string>[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON data (as native JSON objects or JSON-formatted strings). |
json_path |
strThe JSON path identifying the data that you want to obtain from the input. |
parse_json
parse_json(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a series with a JSON-formatted STRING value to a JSON value.
Examples:>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> s
0 {"class": {"students": [{"id": 5}, {"id": 12}]}}
dtype: string
>>> bbq.parse_json(s)
0 {"class":{"students":[{"id":5},{"id":12}]}}
dtype: extension<dbjson<JSONArrowType>>[pyarrow]
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON-formatted strings). |
sql_scalar
sql_scalar(
sql_template: str, columns: typing.Sequence[bigframes.series.Series]
) -> bigframes.series.SeriesCreate a Series from a SQL template.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import pandas as pd
>>> import pyarrow as pa
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(["1.5", "2.5", "3.5"])
>>> s = s.astype(pd.ArrowDtype(pa.decimal128(38, 9)))
>>> bbq.sql_scalar("ROUND({0}, 0, 'ROUND_HALF_EVEN')", [s])
0 2.000000000
1 2.000000000
2 4.000000000
dtype: decimal128(38, 9)[pyarrow]
| Parameters | |
|---|---|
| Name | Description |
sql_template |
strA SQL format string with Python-style {0} placeholders for each of the Series objects in |
columns |
Sequence[bigframes.pandas.Series]Series objects representing the column inputs to the |
st_area
st_area(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
) -> bigframes.series.SeriesReturns the area in square meters covered by the polygons in the input
GEOGRAPHY.
If geography_expression is a point or a line, returns zero. If geography_expression is a collection, returns the area of the polygons in the collection; if the collection doesn't contain polygons, returns zero.
Examples:
>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
... Polygon([(0.10, 0.4), (0.9, 0.5), (0.10, 0.5)]),
... Polygon([(0.1, 0.1), (0.2, 0.1), (0.2, 0.2)]),
... LineString([(0, 0), (1, 1), (0, 1)]),
... Point(0, 1),
... ]
... )
>>> series
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1 POLYGON ((0.1 0.4, 0.9 0.5, 0.1 0.5, 0.1 0.4))
2 POLYGON ((0.1 0.1, 0.2 0.1, 0.2 0.2, 0.1 0.1))
3 LINESTRING (0 0, 1 1, 0 1)
4 POINT (0 1)
dtype: geometry
>>> bbq.st_area(series)
0 61821689.855985
1 494563347.88721
2 61821689.855841
3 0.0
4 0.0
dtype: Float64
Use round() to round the outputed areas to the neares ten millions
>>> bbq.st_area(series).round(-7)
0 60000000.0
1 490000000.0
2 60000000.0
3 0.0
4 0.0
dtype: Float64
| Parameter | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
st_buffer
st_buffer(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
buffer_radius: float,
num_seg_quarter_circle: float = 8.0,
use_spheroid: bool = False,
) -> bigframes.series.SeriesComputes a GEOGRAPHY that represents all points whose distance from the
input GEOGRAPHY is less than or equal to distance meters.
>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Point
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... Point(0, 0),
... Point(1, 1),
... ]
... )
>>> series
0 POINT (0 0)
1 POINT (1 1)
dtype: geometry
>>> buffer = bbq.st_buffer(series, 100)
>>> bbq.st_area(buffer) > 0
0 True
1 True
dtype: boolean
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
buffer_radius |
floatThe distance in meters. |
num_seg_quarter_circle |
float, optionalSpecifies the number of segments that are used to approximate a quarter circle. The default value is 8.0. |
use_spheroid |
bool, optionalDetermines how this function measures distance. If use_spheroid is FALSE, the function measures distance on the surface of a perfect sphere. The use_spheroid parameter currently only supports the value FALSE. The default value of use_spheroid is FALSE. |
st_centroid
st_centroid(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
) -> bigframes.series.SeriesComputes the geometric centroid of a GEOGRAPHY type.
For POINT and MULTIPOINT types, this is the arithmetic mean of the
input coordinates. For LINESTRING and POLYGON types, this is the
center of mass. For GEOMETRYCOLLECTION types, this is the center of
mass of the collection's elements.
>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
... LineString([(0, 0), (1, 1), (0, 1)]),
... Point(0, 1),
... ]
... )
>>> series
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1 LINESTRING (0 0, 1 1, 0 1)
2 POINT (0 1)
dtype: geometry
>>> bbq.st_centroid(series)
0 POINT (0.03333 0.06667)
1 POINT (0.49998 0.70712)
2 POINT (0 1)
dtype: geometry
| Parameter | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
st_convexhull
st_convexhull(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
) -> bigframes.series.SeriesComputes the convex hull of a GEOGRAPHY type.
The convex hull is the smallest convex set that contains all of the
points in the input GEOGRAPHY.
>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
... LineString([(0, 0), (1, 1), (0, 1)]),
... Point(0, 1),
... ]
... )
>>> series
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1 LINESTRING (0 0, 1 1, 0 1)
2 POINT (0 1)
dtype: geometry
>>> bbq.st_convexhull(series)
0 POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1 POLYGON ((0 0, 1 1, 0 1, 0 0))
2 POINT (0 1)
dtype: geometry
| Parameter | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
st_difference
st_difference(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
other: typing.Union[
bigframes.series.Series,
bigframes.geopandas.geoseries.GeoSeries,
shapely.geometry.base.BaseGeometry,
],
) -> bigframes.series.SeriesReturns a GEOGRAPHY that represents the point set difference of
geography_1 and geography_2. Therefore, the result consists of the part
of geography_1 that doesn't intersect with geography_2.
If geometry_1 is completely contained in geometry_2, then ST_DIFFERENCE
returns an empty GEOGRAPHY.
>>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
We can check two GeoSeries against each other, row by row:
>>> s1 = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0, 0), (2, 2), (0, 2)]),
... Polygon([(0, 0), (2, 2), (0, 2)]),
... LineString([(0, 0), (2, 2)]),
... LineString([(2, 0), (0, 2)]),
... Point(0, 1),
... ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0, 0), (1, 1), (0, 1)]),
... LineString([(1, 0), (1, 3)]),
... LineString([(2, 0), (0, 2)]),
... Point(1, 1),
... Point(0, 1),
... ],
... index=range(1, 6),
... )
>>> s1
0 POLYGON ((0 0, 2 2, 0 2, 0 0))
1 POLYGON ((0 0, 2 2, 0 2, 0 0))
2 LINESTRING (0 0, 2 2)
3 LINESTRING (2 0, 0 2)
4 POINT (0 1)
dtype: geometry
>>> s2
1 POLYGON ((0 0, 1 1, 0 1, 0 0))
2 LINESTRING (1 0, 1 3)
3 LINESTRING (2 0, 0 2)
4 POINT (1 1)
5 POINT (0 1)
dtype: geometry
>>> bbq.st_difference(s1, s2)
0 None
1 POLYGON ((0.99954 1, 2 2, 0 2, 0 1, 0.99954 1))
2 LINESTRING (0 0, 1 1.00046, 2 2)
3 GEOMETRYCOLLECTION EMPTY
4 POINT (0 1)
5 None
dtype: geometry
Additionally, we can check difference of a GeoSeries against a single shapely geometry:
>>> polygon = Polygon([(0, 0), (10, 0), (10, 10), (0, 0)])
>>> bbq.st_difference(s1, polygon)
0 POLYGON ((1.97082 2.00002, 0 2, 0 0, 1.97082 2...
1 POLYGON ((1.97082 2.00002, 0 2, 0 0, 1.97082 2...
2 GEOMETRYCOLLECTION EMPTY
3 LINESTRING (0.99265 1.00781, 0 2)
4 POINT (0 1)
dtype: geometry
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
other |
bigframes.pandas.Series bigframes.geopandas.GeoSeries shapely.GeometryThe series or geometric object to subtract from the geography objects in |
st_distance
st_distance(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
other: typing.Union[
bigframes.series.Series,
bigframes.geopandas.geoseries.GeoSeries,
shapely.geometry.base.BaseGeometry,
],
*,
use_spheroid: bool = False
) -> bigframes.series.SeriesReturns the shortest distance in meters between two non-empty
GEOGRAPHY objects.
Examples:
>>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
We can check two GeoSeries against each other, row by row.
>>> s1 = bigframes.geopandas.GeoSeries(
... [
... Point(0, 0),
... Point(0.00001, 0),
... Point(0.00002, 0),
... ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
... [
... Point(0.00001, 0),
... Point(0.00003, 0),
... Point(0.00005, 0),
... ],
... )
>>> bbq.st_distance(s1, s2, use_spheroid=True)
0 1.113195
1 2.22639
2 3.339585
dtype: Float64
We can also calculate the distance of each geometry and a single shapely geometry:
>>> bbq.st_distance(s2, Point(0.00001, 0))
0 0.0
1 2.223902
2 4.447804
dtype: Float64
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
other |
bigframes.pandas.Series bigframes.geopandas.GeoSeries shapely.GeometryThe series or geometric object to calculate the distance in meters to form the geography objects in |
use_spheroid |
optional, default Determines how this function measures distance. If |
st_intersection
st_intersection(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
other: typing.Union[
bigframes.series.Series,
bigframes.geopandas.geoseries.GeoSeries,
shapely.geometry.base.BaseGeometry,
],
) -> bigframes.series.SeriesReturns a GEOGRAPHY that represents the point set intersection of the two
input GEOGRAPHYs. Thus, every point in the intersection appears in both
geography_1 and geography_2.
>>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point
>>> bpd.options.display.progress_bar = None
We can check two GeoSeries against each other, row by row.
>>> s1 = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0, 0), (2, 2), (0, 2)]),
... Polygon([(0, 0), (2, 2), (0, 2)]),
... LineString([(0, 0), (2, 2)]),
... LineString([(2, 0), (0, 2)]),
... Point(0, 1),
... ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
... [
... Polygon([(0, 0), (1, 1), (0, 1)]),
... LineString([(1, 0), (1, 3)]),
... LineString([(2, 0), (0, 2)]),
... Point(1, 1),
... Point(0, 1),
... ],
... index=range(1, 6),
... )
>>> s1
0 POLYGON ((0 0, 2 2, 0 2, 0 0))
1 POLYGON ((0 0, 2 2, 0 2, 0 0))
2 LINESTRING (0 0, 2 2)
3 LINESTRING (2 0, 0 2)
4 POINT (0 1)
dtype: geometry
>>> s2
1 POLYGON ((0 0, 1 1, 0 1, 0 0))
2 LINESTRING (1 0, 1 3)
3 LINESTRING (2 0, 0 2)
4 POINT (1 1)
5 POINT (0 1)
dtype: geometry
>>> bbq.st_intersection(s1, s2)
0 None
1 POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
2 POINT (1 1.00046)
3 LINESTRING (2 0, 0 2)
4 GEOMETRYCOLLECTION EMPTY
5 None
dtype: geometry
We can also do intersection of each geometry and a single shapely geometry:
>>> bbq.st_intersection(s1, Polygon([(0, 0), (1, 1), (0, 1)]))
0 POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
1 POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
2 LINESTRING (0 0, 0.99954 1)
3 GEOMETRYCOLLECTION EMPTY
4 POINT (0 1)
dtype: geometry
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
other |
bigframes.pandas.Series bigframes.geopandas.GeoSeries shapely.GeometryThe series or geometric object to intersect with the geography objects in |
st_isclosed
st_isclosed(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
) -> bigframes.series.SeriesReturns TRUE for a non-empty Geography, where each element in the Geography has an empty boundary.
Examples:>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Point, LineString, Polygon
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... Point(0, 0), # Point
... LineString([(0, 0), (1, 1)]), # Open LineString
... LineString([(0, 0), (1, 1), (0, 1), (0, 0)]), # Closed LineString
... Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
... None,
... ]
... )
>>> series
0 POINT (0 0)
1 LINESTRING (0 0, 1 1)
2 LINESTRING (0 0, 1 1, 0 1, 0 0)
3 POLYGON ((0 0, 1 1, 0 1, 0 0))
4 None
dtype: geometry
>>> bbq.st_isclosed(series)
0 True
1 False
2 True
3 False
4 <NA>
dtype: boolean
| Parameter | |
|---|---|
| Name | Description |
series |
bigframes.pandas.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
st_length
st_length(
series: typing.Union[
bigframes.series.Series, bigframes.geopandas.geoseries.GeoSeries
],
*,
use_spheroid: bool = False
) -> bigframes.series.SeriesReturns the total length in meters of the lines in the input GEOGRAPHY.
If a series element is a point or a polygon, returns zero for that row. If a series element is a collection, returns the length of the lines in the collection; if the collection doesn't contain lines, returns zero.
The optional use_spheroid parameter determines how this function measures distance. If use_spheroid is FALSE, the function measures distance on the surface of a perfect sphere.
The use_spheroid parameter currently only supports the value FALSE. The default value of use_spheroid is FALSE. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length
Examples:
>>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point, GeometryCollection
>>> bpd.options.display.progress_bar = None
>>> series = bigframes.geopandas.GeoSeries(
... [
... LineString([(0, 0), (1, 0)]), # Length will be approx 1 degree in meters
... Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]), # Length is 0
... Point(0, 1), # Length is 0
... GeometryCollection([LineString([(0,0),(0,1)]), Point(1,1)]) # Length of LineString only
... ]
... )
>>> result = bbq.st_length(series)
>>> result
0 111195.101177
1 0.0
2 0.0
3 111195.101177
dtype: Float64
| Parameters | |
|---|---|
| Name | Description |
series |
bigframes.series.Series bigframes.geopandas.GeoSeriesA series containing geography objects. |
use_spheroid |
bool, optionalDetermines how this function measures distance. If FALSE (default), measures distance on a perfect sphere. Currently, only FALSE is supported. |
struct
struct(value: dataframe.DataFrame) -> series.SeriesTakes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.series as series
>>> bpd.options.display.progress_bar = None
>>> srs = series.Series([{"version": 1, "project": "pandas"}, {"version": 2, "project": "numpy"},])
>>> df = srs.struct.explode()
>>> bbq.struct(df)
0 {'project': 'pandas', 'version': 1}
1 {'project': 'numpy', 'version': 2}
dtype: struct<project: string, version: int64>[pyarrow]
Args:
value (bigframes.dataframe.DataFrame):
The DataFrame to be converted to a Series of structs
Returns:
bigframes.series.Series: A new Series with struct entries representing rows of the original DataFrame
to_json
to_json(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a series with a JSON value to a JSON-formatted STRING value.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3])
>>> bbq.to_json(s)
0 1
1 2
2 3
dtype: extension<dbjson<JSONArrowType>>[pyarrow]
>>> s = bpd.Series([{"int": 1, "str": "pandas"}, {"int": 2, "str": "numpy"}])
>>> bbq.to_json(s)
0 {"int":1,"str":"pandas"}
1 {"int":2,"str":"numpy"}
dtype: extension<dbjson<JSONArrowType>>[pyarrow]
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series containing JSON or JSON-formatted string values. |
to_json_string
to_json_string(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a series to a JSON-formatted STRING value.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3])
>>> bbq.to_json_string(s)
0 1
1 2
2 3
dtype: string
>>> s = bpd.Series([{"int": 1, "str": "pandas"}, {"int": 2, "str": "numpy"}])
>>> bbq.to_json_string(s)
0 {"int":1,"str":"pandas"}
1 {"int":2,"str":"numpy"}
dtype: string
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.series.SeriesThe Series to be converted. |
unix_micros
unix_micros(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a timestmap series to unix epoch microseconds
Examples:
>>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_micros(s)
0 86400000000
1 172800000000
dtype: Int64
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.pandas.SeriesA timestamp series. |
unix_millis
unix_millis(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a timestmap series to unix epoch milliseconds
Examples:
>>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_millis(s)
0 86400000
1 172800000
dtype: Int64
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.pandas.SeriesA timestamp series. |
unix_seconds
unix_seconds(input: bigframes.series.Series) -> bigframes.series.SeriesConverts a timestmap series to unix epoch seconds
Examples:
>>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_seconds(s)
0 86400
1 172800
dtype: Int64
| Parameter | |
|---|---|
| Name | Description |
input |
bigframes.pandas.SeriesA timestamp series. |
vector_search
vector_search(
base_table: str,
column_to_search: str,
query: Union[dataframe.DataFrame, series.Series],
*,
query_column_to_search: Optional[str] = None,
top_k: Optional[int] = None,
distance_type: Optional[Literal["euclidean", "cosine", "dot_product"]] = None,
fraction_lists_to_search: Optional[float] = None,
use_brute_force: Optional[bool] = None,
allow_large_results: Optional[bool] = None
) -> dataframe.DataFrameConduct vector search which searches embeddings to find semantically similar entities.
This method calls the VECTOR_SEARCH() SQL function
<https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#vector_search>_.
Examples:
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column
is used as the search query:
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
... "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... top_k=2).sort_values("id")
query_id embedding id my_embedding distance
0 dog [1. 2.] 1 [1. 2.] 0.0
1 cat [3. 5.2] 2 [2. 4.] 1.56205
0 dog [1. 2.] 4 [1. 3.2] 1.2
1 cat [3. 5.2] 5 [5. 5.4] 2.009975
<BLANKLINE>
[4 rows x 5 columns]
Series embeddings for which to find nearest neighbors:
>>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
... index=["dog", "cat"],
... name="embedding")
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... top_k=2,
... use_brute_force=True).sort_values("id")
embedding id my_embedding distance
dog [1. 2.] 1 [1. 2.] 0.0
cat [3. 5.2] 2 [2. 4.] 1.56205
dog [1. 2.] 4 [1. 3.2] 1.2
cat [3. 5.2] 5 [5. 5.4] 2.009975
<BLANKLINE>
[4 rows x 4 columns]
You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.
>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
... "embedding": [[1.0, 2.0], [3.0, 5.2]],
... "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... distance_type="cosine",
... query_column_to_search="another_embedding",
... top_k=2).sort_values("id")
query_id embedding another_embedding id my_embedding distance
1 cat [3. 5.2] [3.3 5.2] 1 [1. 2.] 0.005181
1 cat [3. 5.2] [3.3 5.2] 2 [2. 4.] 0.005181
0 dog [1. 2.] [0.7 2.2] 3 [1.5 7. ] 0.004697
0 dog [1. 2.] [0.7 2.2] 4 [1. 3.2] 0.000013
<BLANKLINE>
[4 rows x 6 columns]
| Parameters | |
|---|---|
| Name | Description |
base_table |
strThe table to search for nearest neighbor embeddings. |
column_to_search |
strThe name of the base table column to search for nearest neighbor embeddings. The column must have a type of |
query |
bigframes.dataframe.DataFrame bigframes.dataframe.SeriesA Series or DataFrame that provides the embeddings for which to find nearest neighbors. |
query_column_to_search |
strSpecifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of |
top_k |
intSepecifies the number of nearest neighbors to return. Default to 10. |
distance_type |
str, defalt "euclidean"Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean", "cosine" and "dot_product". Default to "euclidean". |
fraction_lists_to_search |
float, range in [0.0, 1.0]Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify |
use_brute_force |
boolDetermines whether to use brute force search by skipping the vector index if one is available. Default to False. |
allow_large_results |
bool, optionalWhether to allow large query results. If |