Changelog

PyPI History

0.14.0 (2023-11-14)

Features

Add ‘cross’ join support (#176) (765446a)
Add ‘index’, ‘pad’, ‘nearest’ interpolate methods (#162) (6a28403)
Add series.sample (identical to existing dataframe.sample) (#187) (37914a4)
Add unordered sql compilation (#156) (58f420c)
Log most recent API calls as recent-bigframes-api-xx labels on BigQuery jobs (#145) (4ea33b7)
Read_gbq creates order deterministically without table copy (#191) (8ab81de)
Support date_series.astype("string[pyarrow]") to cast DATE to STRING (#186) (aee0e8e)
Support series.at[row_label] = scalar (#173) (0c8bd33)
Temporary resources no longer use BigQuery Sessions (#194) (4a02cac)

Bug Fixes

All sort operation are now stable (#195) (3a2761f)
Default to 7 days expiration for read_csv, read_json, read_parquet (#193) (03606cd)
Deprecate the remote_service_type in llm model (#180) (a8a409a)
For reset_index on unnamed multiindex, always use level_[n] label (#182) (f95000d)
Match pandas behavior when assigning listlike to empty dfs (#172) (c1d1f42)
Use anonymous dataset instead of session dataset for temp tables (#181) (800d44e)
Use random table for read_pandas (#192) (741c75e)
Use random table when loading data for read_csv, read_json, read_parquet (#175) (9d2e6dc)

Documentation

Add code samples for read_gbq_function using community UDFs (#188) (7506eab)
Add docstring code samples for Series.apply and DataFrame.map (#185) (c816d84)
Add llm kmeans notebook as an included example (#177) (d49ae42)
Use head() to get top n results, not to preview results (#190) (87f84c9)

0.13.0 (2023-11-07)

Features

to_gbq without a destination table writes to a temporary table (#158) (e1817c9)
Add DataFrame.__iter__, DataFrame.iterrows, DataFrame.itertuples, and DataFrame.keys methods (#164) (c065071)
Add Series.__iter__ method (#164) (c065071)
Add interpolate() to series and dataframe (#157) (b9cb55c)
Support 32k text-generation and multilingual embedding models (#161) (5f0ea37)

Bug Fixes

Update default temp table expiration to 7 days (#174) (4ff26cd)

0.12.0 (2023-11-01)

Features

Add DataFrame.melt (#113) (4e4409c)
Add DataFrame.to_pandas_batches() to download large DataFrame objects (#136) (3afd4a3)
Add bigframes.options.compute.maximum_bytes_billed option that sets maximum bytes billed on query jobs (#133) (63c7919)
Add pandas.qcut (#104) (8e44518)
Add pd.get_dummies (#149) (d8baad5)
Add unstack to series, add level param (#115) (5edcd19)
Implement operator @ for DataFrame.dot (#139) (79a638e)
Populate ibis version in user agent (#140) (c639a36)

Bug Fixes

Don’t override the global logging config (#138) (2ddbf74)
Fix bug with column names under repeated column assignment (#150) (29032d0)
Resolve plotly rendering issue by using ipython html for job pro… (#134) (39df43e)
Use indexee’s session for loc listlike cases (#152) (27c5725)

Documentation

Add artithmetic df sample code (#153) (ac44ccd)
Fix indentation on read_gbq_function code sample (#163) (0801d96)
Link to ML.EVALUATE BQML page for score() methods (#137) (45c617f)

0.11.0 (2023-10-26)

Features

Add back reset_session as an alias for close_session (#124) (694a85a)
Change query parameter to query_or_table in read_gbq (#127) (f9bb3c4)

Bug Fixes

Expose bigframes.pandas.reset_session as a public API (#128) (b17e1f4)
Use series’s own session in series.reindex listlike case (#135) (95bff3f)

Documentation

Add runnable code samples for DataFrames I/O methods and property (#129) (6fea8ef)
Add runnable code samples for reading methods (#125) (a669919)

0.10.0 (2023-10-19)

Features

Implement DataFrame.dot for matrix multiplication (#67) (29dd414)

0.9.0 (2023-10-18)

⚠ BREAKING CHANGES

rename bigframes.pandas.reset_session to close_session (#101)

Features

Add bigframes.options.bigquery.application_name for partner attribution (#117) (52d64ff)
Add AtIndexer getitems (#107) (752b01f)
Rename bigframes.pandas.reset_session to close_session (#101) (36693bf)
Send BigQuery cancel request when canceling bigframes process (#103) (e325fbb)
Support external packages in remote_function (#98) (ec10c4a)
Use ArrowDtype for STRUCT columns in to_pandas (#85) (9238fad)

Bug Fixes

Support multiindex for three loc getitem overloads (#113) (68e3cd3)

Performance Improvements

If primary keys are defined, read_gbq avoids copying table data (#112) (e6c0cd1)

Documentation

Add documentation for Series.struct.field and Series.struct.explode (#114) (a6dab9c)
Add open-source link in API doc (#106) (db51fe3)
Update ML overview API doc (#105) (1b3f3a5)

0.8.0 (2023-10-12)

⚠ BREAKING CHANGES

The default behavior of to_parquet is changing from no compression to 'snappy' compression.

Features

Support compression in to_parquet (a8c286f)

Bug Fixes

Create session dataset for remote functions only when needed (#94) (1d385be)

0.7.0 (2023-10-11)

Features

Add aliases for several series properties (#80) (c0efec8)
Add equals methods to series/dataframe (#76) (636a209)
Add iat and iloc accessing by tuples of integers (#90) (228aeba)
Add level param to DataFrame.stack (#88) (97b8bec)
Allow df.drop to take an index object (#68) (740c451)
Use default session connection (#87) (4ae4ef9)

Bug Fixes

Change the invalid url in docs (#93) (969800d)

Documentation

Add more preprocessing models into the docs menu. (#97) (1592315)

0.6.0 (2023-10-04)

Features

Add df.unstack (#63) (4a84714)
Add idxmin, idxmax to series, dataframe (#74) (781307e)
Add ml.preprocessing.KBinsDiscretizer (#81) (24c6256)
Add multi-column dataframe merge (#73) (c9fa85c)
Add update and align methods to dataframe (#57) (bf050cf)
Support STRUCT data type with Series.struct.field to extract child fields (#71) (17afac9)

Bug Fixes

Avoid 403 response too large to return error with read_gbq and large query results (#77) (8f3b5b2)
Change return type of Series.loc[scalar] (#40) (fff3d45)
Fix df/series.iloc by list with multiindex (#79) (971d091)

0.5.0 (2023-09-28)

Features

Add DataFrame.kurtosis / DF.kurt method (c1900c2)
Add DataFrame.rolling and DataFrame.expanding methods (c1900c2)
Add items, apply methods to DataFrame. (#43) (3adc1b3)
Add axis param to simple df aggregations (#52) (9cf9972)
Add index dtype, astype, drop, fillna, aggregate attributes. (#38) (1a254a4)
Add ml.preprocessing.LabelEncoder (#50) (2510461)
Add ml.preprocessing.MaxAbsScaler (#56) (14b262b)
Add ml.preprocessing.MinMaxScaler (#64) (392113b)
Add more index methods (#54) (a6e32aa)
Support calculate_p_values parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support class_weights="balanced" in LogisticRegression model (c1900c2)
Support df[column_name] = df_only_one_column (c1900c2)
Support early_stop parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support enable_global_explain parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support l2_reg parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support learn_rate_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support ls_init_learn_rate parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support max_iterations parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support min_rel_progress parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support optimize_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support casting string to integer or float (#59) (3502f83)

Bug Fixes

Fix header skipping logic in read_csv (#49) (d56258c)
Generate unique ids on join to avoid id collisions (#65) (7ab65e8)
LabelEncoder params consistent with Sklearn (#60) (632caec)
Loosen filter items tests to accomodate shifting pandas impl (#41) (edabdbb)

Performance Improvements

Add ability to cache dataframe and series to session table (#51) (416d7cb)
Inline small Series and DataFrames in query text (#45) (5e199ec)
Reimplement unpivot to use cross join rather than union (#47) (f9a93ce)
Simplify join order to use multiple order keys instead of string. (#36) (5056da6)

Documentation

Link to Remote Functions code samples from README and API reference (c1900c2)

0.4.0 (2023-09-16)

Features

Add axis parameter to droplevel and reorder_levels (7c6b0dd)
Add bfill and ffill to DataFrame and Series (7c6b0dd)
Add DataFrame.combine and DataFrame.combine_first (#27) (7c6b0dd)
Add DataFrame.nlargest, nsmallest (7c6b0dd)
Add DataFrame.pct_change and Series.pct_change (7c6b0dd)
Add DataFrame.skew and GroupBy.skew (7c6b0dd)
Add DataFrame.to_dict, to_excel, to_latex, to_records, to_string, to_markdown, to_pickle, to_orc (7c6b0dd)
Add diff method to DataFrame and GroupBy (7c6b0dd)
Add filter and reindex to Series and DataFrame (7c6b0dd)
Add reindex_like to DataFrame and Series (7c6b0dd)
Add swaplevel to DataFrame and Series (7c6b0dd)
Add partial support for Sereies.replace (7c6b0dd)
Support DataFrame.loc[bool_series, column] = scalar (7c6b0dd)
Support a persistent name in remote_function (7c6b0dd)

Bug Fixes

remote_function uses same credentials as other APIs (7c6b0dd)
Add type hints to models (7c6b0dd)
Raise error when ARIMAPlus is used with Pipeline (7c6b0dd)
Remove transforms parameter in model.fit (breaking change) (7c6b0dd)
Support column joins with “None indexer” (7c6b0dd)
Use for literals Int64Dtype in cut (7c6b0dd)
Use lowercase strings for parameter literals in bigframes.ml (breaking change) (7c6b0dd)

Performance Improvements

bigframes-api label to I/O query jobs (7c6b0dd)

Documentation

Document possible parameter values for PaLM2TextGenerator (7c6b0dd)
Document region logic in README (7c6b0dd)
Fix OneHotEncoder sample (7c6b0dd)

0.3.2 (2023-09-06)

Bug Fixes

Make release.sh script for PyPI upload executable (#20) (9951610)

0.3.1 (2023-09-05)

Bug Fixes

release: Use correct directory name for release build config (#17) (3dd25b3)

0.3.0 (2023-09-02)

Features

Add bigframes.get_global_session() and bigframes.reset_session() aliases (a32b747)
Add bigframes.pandas.read_pickle function (a32b747)
Add components_, explained_variance_, and explained_variance_ratio_ properties to bigframes.ml.decomposition.PCA (89b9503)
Add fit_transform to bigquery.ml transformers (a32b747)
Add Series.dropna and DataFrame.fillna (8fab755)
Add Series.str methods isalpha, isdigit, isdecimal, isalnum, isspace, islower, isupper, zfill, center (a32b747)
Support bigframes.pandas.merge() (8fab755)
Support DataFrame.isin with list and dict inputs (8fab755)
Support DataFrame.pivot (a32b747)
Support DataFrame.stack (89b9503)
Support DataFrame-DataFrame binary operations (8fab755)
Support df[my_column] = [a python list] (89b9503)
Support Index.is_monotonic (8fab755)
Support np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.exp with Series argument (89b9503)
Support np.sin, np.cos, np.tan, np.log, np.log10, np.sqrt, np.abs with Series argument (89b9503)
Support pow() and power operator in DataFrame and Series (8fab755)
Support read_json with engine=bigquery for newline-delimited JSON files (89b9503)
Support Series.corr (89b9503)
Support Series.map (8fab755)
Support for np.add, np.subtract, np.multiply, np.divide, np.power (8fab755)
Support MultiIndex for DataFrame columns (a32b747)
Use pandas.Index for column labels (a32b747)
Use default session and connection in ml.llm and ml.imported (8fab755)

Bug Fixes

Add error message to set_index (a32b747)
Align column names with pandas in DataFrame.agg results (89b9503)
Allow (but still not recommended) ORDER BY in read_gbq input when an index_col is defined (89b9503)
Check for IAM role on the BigQuery connection when initializing a remote_function (89b9503)
Check that types are specified in read_gbq_function (a32b747)
Don’t use query cache for Session construction (a32b747)
Include survey link in abstract NotImplementedError exception messages (89b9503)
Label temp table creation jobs with source=bigquery-dataframes-temp label (89b9503)
Make X_train argument names consistent across methods (8fab755)
Raise AttributeError for unimplemented pandas methods (89b9503)
Raise exception for invalid function in read_gbq_function (a32b747)
Support spaces in column names in DataFrame initializater (89b9503)

Performance Improvements

Add local cache for __repr_\*__ methods (a32b747)
Lazily instantiate client library objects (89b9503)
Use row_number() filter for head / tail (8fab755)

Documentation

Add ML section under Overview (a32b747)
Add release status to table of contents (a32b747)
Add samples and best practices to read_gbq docs (a32b747)
Correct the return types of Dataframe and Series (a32b747)
Create subfolders for notebooks (a32b747)
Fix link to GitHub (89b9503)
Highlight bigframes is open-source (a32b747)
Sample ML Drug Name Generation notebook (a32b747)
Set options.bigquery.project in sample code (89b9503)
Transform remote function user guide into sample code (a32b747)
Update remote function notebook with read_gbq_function usage (8fab755)

0.2.0 (2023-08-17)

Features

Add KMeans.cluster_centers_.
Allow column labels to be any type handled by bq df, column labels can be integers now.
Add dataframegroupby.agg().
Add Series Property is_monotonic_increasing and is_monotonic_decreasing.
Add match, fullmatch, get, pad str methods.
Add series isin function.

Bug Fixes

Update ML package to use sessions for queries.
Optimize read_gbq with index_col set to cluster by index_col.
Raise ValueError if the location mismatched.
read_gbq no longer uses ‘time travel’ with query inputs.

Documentation

Add docstring to _uniform_sampling to avoid user using it.

0.1.1 (2023-08-14)

Documentation

Correct link to code repository in setup.py and use correct terminology for console.cloud.google.com links.

0.1.0 (2023-08-11)

Features

Add bigframes.pandas package with an API compatible with pandas. Supported data sources include: BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local and Cloud Storage), and more.
Add bigframes.ml package with an API inspired by scikit-learn. Train machine learning models and run batch predicition, powered by BigQuery ML.

0.0.0 (2023-02-22)

Empty package to reserve package name.