Changelog

PyPI History

0.14.0 (2023-11-14)

Features

  • Add ‘cross’ join support (#176) (765446a)

  • Add ‘index’, ‘pad’, ‘nearest’ interpolate methods (#162) (6a28403)

  • Add series.sample (identical to existing dataframe.sample) (#187) (37914a4)

  • Add unordered sql compilation (#156) (58f420c)

  • Log most recent API calls as recent-bigframes-api-xx labels on BigQuery jobs (#145) (4ea33b7)

  • Read_gbq creates order deterministically without table copy (#191) (8ab81de)

  • Support date_series.astype("string[pyarrow]") to cast DATE to STRING (#186) (aee0e8e)

  • Support series.at[row_label] = scalar (#173) (0c8bd33)

  • Temporary resources no longer use BigQuery Sessions (#194) (4a02cac)

Bug Fixes

  • All sort operation are now stable (#195) (3a2761f)

  • Default to 7 days expiration for read_csv, read_json, read_parquet (#193) (03606cd)

  • Deprecate the remote_service_type in llm model (#180) (a8a409a)

  • For reset_index on unnamed multiindex, always use level_[n] label (#182) (f95000d)

  • Match pandas behavior when assigning listlike to empty dfs (#172) (c1d1f42)

  • Use anonymous dataset instead of session dataset for temp tables (#181) (800d44e)

  • Use random table for read_pandas (#192) (741c75e)

  • Use random table when loading data for read_csv, read_json, read_parquet (#175) (9d2e6dc)

Documentation

  • Add code samples for read_gbq_function using community UDFs (#188) (7506eab)

  • Add docstring code samples for Series.apply and DataFrame.map (#185) (c816d84)

  • Add llm kmeans notebook as an included example (#177) (d49ae42)

  • Use head() to get top n results, not to preview results (#190) (87f84c9)

0.13.0 (2023-11-07)

Features

  • to_gbq without a destination table writes to a temporary table (#158) (e1817c9)

  • Add DataFrame.__iter__, DataFrame.iterrows, DataFrame.itertuples, and DataFrame.keys methods (#164) (c065071)

  • Add Series.__iter__ method (#164) (c065071)

  • Add interpolate() to series and dataframe (#157) (b9cb55c)

  • Support 32k text-generation and multilingual embedding models (#161) (5f0ea37)

Bug Fixes

  • Update default temp table expiration to 7 days (#174) (4ff26cd)

0.12.0 (2023-11-01)

Features

  • Add DataFrame.melt (#113) (4e4409c)

  • Add DataFrame.to_pandas_batches() to download large DataFrame objects (#136) (3afd4a3)

  • Add bigframes.options.compute.maximum_bytes_billed option that sets maximum bytes billed on query jobs (#133) (63c7919)

  • Add pandas.qcut (#104) (8e44518)

  • Add pd.get_dummies (#149) (d8baad5)

  • Add unstack to series, add level param (#115) (5edcd19)

  • Implement operator @ for DataFrame.dot (#139) (79a638e)

  • Populate ibis version in user agent (#140) (c639a36)

Bug Fixes

  • Don’t override the global logging config (#138) (2ddbf74)

  • Fix bug with column names under repeated column assignment (#150) (29032d0)

  • Resolve plotly rendering issue by using ipython html for job pro… (#134) (39df43e)

  • Use indexee’s session for loc listlike cases (#152) (27c5725)

Documentation

  • Add artithmetic df sample code (#153) (ac44ccd)

  • Fix indentation on read_gbq_function code sample (#163) (0801d96)

  • Link to ML.EVALUATE BQML page for score() methods (#137) (45c617f)

0.11.0 (2023-10-26)

Features

  • Add back reset_session as an alias for close_session (#124) (694a85a)

  • Change query parameter to query_or_table in read_gbq (#127) (f9bb3c4)

Bug Fixes

  • Expose bigframes.pandas.reset_session as a public API (#128) (b17e1f4)

  • Use series’s own session in series.reindex listlike case (#135) (95bff3f)

Documentation

  • Add runnable code samples for DataFrames I/O methods and property (#129) (6fea8ef)

  • Add runnable code samples for reading methods (#125) (a669919)

0.10.0 (2023-10-19)

Features

  • Implement DataFrame.dot for matrix multiplication (#67) (29dd414)

0.9.0 (2023-10-18)

⚠ BREAKING CHANGES

  • rename bigframes.pandas.reset_session to close_session (#101)

Features

  • Add bigframes.options.bigquery.application_name for partner attribution (#117) (52d64ff)

  • Add AtIndexer getitems (#107) (752b01f)

  • Rename bigframes.pandas.reset_session to close_session (#101) (36693bf)

  • Send BigQuery cancel request when canceling bigframes process (#103) (e325fbb)

  • Support external packages in remote_function (#98) (ec10c4a)

  • Use ArrowDtype for STRUCT columns in to_pandas (#85) (9238fad)

Bug Fixes

  • Support multiindex for three loc getitem overloads (#113) (68e3cd3)

Performance Improvements

  • If primary keys are defined, read_gbq avoids copying table data (#112) (e6c0cd1)

Documentation

  • Add documentation for Series.struct.field and Series.struct.explode (#114) (a6dab9c)

  • Add open-source link in API doc (#106) (db51fe3)

  • Update ML overview API doc (#105) (1b3f3a5)

0.8.0 (2023-10-12)

⚠ BREAKING CHANGES

  • The default behavior of to_parquet is changing from no compression to 'snappy' compression.

Features

  • Support compression in to_parquet (a8c286f)

Bug Fixes

  • Create session dataset for remote functions only when needed (#94) (1d385be)

0.7.0 (2023-10-11)

Features

  • Add aliases for several series properties (#80) (c0efec8)

  • Add equals methods to series/dataframe (#76) (636a209)

  • Add iat and iloc accessing by tuples of integers (#90) (228aeba)

  • Add level param to DataFrame.stack (#88) (97b8bec)

  • Allow df.drop to take an index object (#68) (740c451)

  • Use default session connection (#87) (4ae4ef9)

Bug Fixes

Documentation

  • Add more preprocessing models into the docs menu. (#97) (1592315)

0.6.0 (2023-10-04)

Features

  • Add df.unstack (#63) (4a84714)

  • Add idxmin, idxmax to series, dataframe (#74) (781307e)

  • Add ml.preprocessing.KBinsDiscretizer (#81) (24c6256)

  • Add multi-column dataframe merge (#73) (c9fa85c)

  • Add update and align methods to dataframe (#57) (bf050cf)

  • Support STRUCT data type with Series.struct.field to extract child fields (#71) (17afac9)

Bug Fixes

  • Avoid 403 response too large to return error with read_gbq and large query results (#77) (8f3b5b2)

  • Change return type of Series.loc[scalar] (#40) (fff3d45)

  • Fix df/series.iloc by list with multiindex (#79) (971d091)

0.5.0 (2023-09-28)

Features

  • Add DataFrame.kurtosis / DF.kurt method (c1900c2)

  • Add DataFrame.rolling and DataFrame.expanding methods (c1900c2)

  • Add items, apply methods to DataFrame. (#43) (3adc1b3)

  • Add axis param to simple df aggregations (#52) (9cf9972)

  • Add index dtype, astype, drop, fillna, aggregate attributes. (#38) (1a254a4)

  • Add ml.preprocessing.LabelEncoder (#50) (2510461)

  • Add ml.preprocessing.MaxAbsScaler (#56) (14b262b)

  • Add ml.preprocessing.MinMaxScaler (#64) (392113b)

  • Add more index methods (#54) (a6e32aa)

  • Support calculate_p_values parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support class_weights="balanced" in LogisticRegression model (c1900c2)

  • Support df[column_name] = df_only_one_column (c1900c2)

  • Support early_stop parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support enable_global_explain parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support l2_reg parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support learn_rate_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support ls_init_learn_rate parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support max_iterations parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support min_rel_progress parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support optimize_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support casting string to integer or float (#59) (3502f83)

Bug Fixes

  • Fix header skipping logic in read_csv (#49) (d56258c)

  • Generate unique ids on join to avoid id collisions (#65) (7ab65e8)

  • LabelEncoder params consistent with Sklearn (#60) (632caec)

  • Loosen filter items tests to accomodate shifting pandas impl (#41) (edabdbb)

Performance Improvements

  • Add ability to cache dataframe and series to session table (#51) (416d7cb)

  • Inline small Series and DataFrames in query text (#45) (5e199ec)

  • Reimplement unpivot to use cross join rather than union (#47) (f9a93ce)

  • Simplify join order to use multiple order keys instead of string. (#36) (5056da6)

Documentation

  • Link to Remote Functions code samples from README and API reference (c1900c2)

0.4.0 (2023-09-16)

Features

  • Add axis parameter to droplevel and reorder_levels (7c6b0dd)

  • Add bfill and ffill to DataFrame and Series (7c6b0dd)

  • Add DataFrame.combine and DataFrame.combine_first (#27) (7c6b0dd)

  • Add DataFrame.nlargest, nsmallest (7c6b0dd)

  • Add DataFrame.pct_change and Series.pct_change (7c6b0dd)

  • Add DataFrame.skew and GroupBy.skew (7c6b0dd)

  • Add DataFrame.to_dict, to_excel, to_latex, to_records, to_string, to_markdown, to_pickle, to_orc (7c6b0dd)

  • Add diff method to DataFrame and GroupBy (7c6b0dd)

  • Add filter and reindex to Series and DataFrame (7c6b0dd)

  • Add reindex_like to DataFrame and Series (7c6b0dd)

  • Add swaplevel to DataFrame and Series (7c6b0dd)

  • Add partial support for Sereies.replace (7c6b0dd)

  • Support DataFrame.loc[bool_series, column] = scalar (7c6b0dd)

  • Support a persistent name in remote_function (7c6b0dd)

Bug Fixes

  • remote_function uses same credentials as other APIs (7c6b0dd)

  • Add type hints to models (7c6b0dd)

  • Raise error when ARIMAPlus is used with Pipeline (7c6b0dd)

  • Remove transforms parameter in model.fit (breaking change) (7c6b0dd)

  • Support column joins with “None indexer” (7c6b0dd)

  • Use for literals Int64Dtype in cut (7c6b0dd)

  • Use lowercase strings for parameter literals in bigframes.ml (breaking change) (7c6b0dd)

Performance Improvements

  • bigframes-api label to I/O query jobs (7c6b0dd)

Documentation

  • Document possible parameter values for PaLM2TextGenerator (7c6b0dd)

  • Document region logic in README (7c6b0dd)

  • Fix OneHotEncoder sample (7c6b0dd)

0.3.2 (2023-09-06)

Bug Fixes

  • Make release.sh script for PyPI upload executable (#20) (9951610)

0.3.1 (2023-09-05)

Bug Fixes

  • release: Use correct directory name for release build config (#17) (3dd25b3)

0.3.0 (2023-09-02)

Features

  • Add bigframes.get_global_session() and bigframes.reset_session() aliases (a32b747)

  • Add bigframes.pandas.read_pickle function (a32b747)

  • Add components_, explained_variance_, and explained_variance_ratio_ properties to bigframes.ml.decomposition.PCA (89b9503)

  • Add fit_transform to bigquery.ml transformers (a32b747)

  • Add Series.dropna and DataFrame.fillna (8fab755)

  • Add Series.str methods isalpha, isdigit, isdecimal, isalnum, isspace, islower, isupper, zfill, center (a32b747)

  • Support bigframes.pandas.merge() (8fab755)

  • Support DataFrame.isin with list and dict inputs (8fab755)

  • Support DataFrame.pivot (a32b747)

  • Support DataFrame.stack (89b9503)

  • Support DataFrame-DataFrame binary operations (8fab755)

  • Support df[my_column] = [a python list] (89b9503)

  • Support Index.is_monotonic (8fab755)

  • Support np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.exp with Series argument (89b9503)

  • Support np.sin, np.cos, np.tan, np.log, np.log10, np.sqrt, np.abs with Series argument (89b9503)

  • Support pow() and power operator in DataFrame and Series (8fab755)

  • Support read_json with engine=bigquery for newline-delimited JSON files (89b9503)

  • Support Series.corr (89b9503)

  • Support Series.map (8fab755)

  • Support for np.add, np.subtract, np.multiply, np.divide, np.power (8fab755)

  • Support MultiIndex for DataFrame columns (a32b747)

  • Use pandas.Index for column labels (a32b747)

  • Use default session and connection in ml.llm and ml.imported (8fab755)

Bug Fixes

  • Add error message to set_index (a32b747)

  • Align column names with pandas in DataFrame.agg results (89b9503)

  • Allow (but still not recommended) ORDER BY in read_gbq input when an index_col is defined (89b9503)

  • Check for IAM role on the BigQuery connection when initializing a remote_function (89b9503)

  • Check that types are specified in read_gbq_function (a32b747)

  • Don’t use query cache for Session construction (a32b747)

  • Include survey link in abstract NotImplementedError exception messages (89b9503)

  • Label temp table creation jobs with source=bigquery-dataframes-temp label (89b9503)

  • Make X_train argument names consistent across methods (8fab755)

  • Raise AttributeError for unimplemented pandas methods (89b9503)

  • Raise exception for invalid function in read_gbq_function (a32b747)

  • Support spaces in column names in DataFrame initializater (89b9503)

Performance Improvements

  • Add local cache for __repr_\*__ methods (a32b747)

  • Lazily instantiate client library objects (89b9503)

  • Use row_number() filter for head / tail (8fab755)

Documentation

  • Add ML section under Overview (a32b747)

  • Add release status to table of contents (a32b747)

  • Add samples and best practices to read_gbq docs (a32b747)

  • Correct the return types of Dataframe and Series (a32b747)

  • Create subfolders for notebooks (a32b747)

  • Fix link to GitHub (89b9503)

  • Highlight bigframes is open-source (a32b747)

  • Sample ML Drug Name Generation notebook (a32b747)

  • Set options.bigquery.project in sample code (89b9503)

  • Transform remote function user guide into sample code (a32b747)

  • Update remote function notebook with read_gbq_function usage (8fab755)

0.2.0 (2023-08-17)

Features

  • Add KMeans.cluster_centers_.

  • Allow column labels to be any type handled by bq df, column labels can be integers now.

  • Add dataframegroupby.agg().

  • Add Series Property is_monotonic_increasing and is_monotonic_decreasing.

  • Add match, fullmatch, get, pad str methods.

  • Add series isin function.

Bug Fixes

  • Update ML package to use sessions for queries.

  • Optimize read_gbq with index_col set to cluster by index_col.

  • Raise ValueError if the location mismatched.

  • read_gbq no longer uses ‘time travel’ with query inputs.

Documentation

  • Add docstring to _uniform_sampling to avoid user using it.

0.1.1 (2023-08-14)

Documentation

  • Correct link to code repository in setup.py and use correct terminology for console.cloud.google.com links.

0.1.0 (2023-08-11)

Features

  • Add bigframes.pandas package with an API compatible with pandas. Supported data sources include: BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local and Cloud Storage), and more.

  • Add bigframes.ml package with an API inspired by scikit-learn. Train machine learning models and run batch predicition, powered by BigQuery ML.

0.0.0 (2023-02-22)

  • Empty package to reserve package name.