- 2.0.0 (latest)
- 2.0.0-dev0
- 1.42.0
- 1.41.0
- 1.40.0
- 1.39.0
- 1.38.0
- 1.37.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Matrix Decomposition models. This module is styled after Scikit-Learn's decomposition module: https://scikit-learn.org/stable/modules/decomposition.html.
Classes
MatrixFactorization
MatrixFactorization(
*,
feedback_type: typing.Literal["explicit", "implicit"] = "explicit",
num_factors: int,
user_col: str,
item_col: str,
rating_col: str = "rating",
l2_reg: float = 1.0
)
Matrix Factorization (MF).
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import MatrixFactorization
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({
... "row": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6],
... "column": [0,1] * 7,
... "value": [1, 1, 2, 1, 3, 1.2, 4, 1, 5, 0.8, 6, 1, 2, 3],
... })
>>> model = MatrixFactorization(feedback_type='explicit', num_factors=6, user_col='row', item_col='column', rating_col='value', l2_reg=2.06)
>>> W = model.fit(X)
Parameters | |
---|---|
Name | Description |
feedback_type |
'explicit' 'implicit'
Specifies the feedback type for the model. The feedback type determines the algorithm that is used during training. |
num_factors |
int or auto, default auto
Specifies the number of latent factors to use. |
user_col |
str
The user column name. |
item_col |
str
The item column name. |
l2_reg |
float, default 1.0
A floating point value for L2 regularization. The default value is 1.0. |
PCA
PCA(
n_components: typing.Optional[typing.Union[int, float]] = None,
*,
svd_solver: typing.Literal["full", "randomized", "auto"] = "auto"
)
Principal component analysis (PCA).
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.decomposition import PCA
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [-1, -2, -3, 1, 2, 3], "feat1": [-1, -1, -2, 1, 1, 2]})
>>> pca = PCA(n_components=2).fit(X)
>>> pca.predict(X) # doctest:+SKIP
principal_component_1 principal_component_2
0 -0.755243 0.157628
1 -1.05405 -0.141179
2 -1.809292 0.016449
3 0.755243 -0.157628
4 1.05405 0.141179
5 1.809292 -0.016449
<BLANKLINE>
[6 rows x 2 columns]
>>> pca.explained_variance_ratio_ # doctest:+SKIP
principal_component_id explained_variance_ratio
0 1 0.00901
1 0 0.99099
<BLANKLINE>
[2 rows x 2 columns]
Parameters | |
---|---|
Name | Description |
n_components |
int, float or None, default None
Number of components to keep. If n_components is not set, all components are kept, n_components = min(n_samples, n_features). If 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. |
svd_solver |
"full", "randomized" or "auto", default "auto"
The solver to use to calculate the principal components. Details: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-pca#pca_solver. |