Class PCA (0.24.0)

PCA(n_components: int = 3)

Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the truncated SVD.

Parameter
Name	Description
`n_components`	`Optional[int], default 3` Number of components to keep. if n_components is not set all components are kept.

Properties

components_

Principal axes in feature space, representing the directions of maximum variance in the data.

Returns

Type Description

bigframes.dataframe.DataFrame DataFrame of principal components, containing following columns: principal_component_id: An integer that identifies the principal component. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the principal component that principal_component_id identifies. If feature isn't numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per component.

explained_variance_

The amount of variance explained by each of the selected components.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance: The factor by which the eigenvector is scaled. Eigenvalue and explained variance are the same concepts in PCA.

explained_variance_ratio_

Percentage of variance explained by each of the selected components.

Returns

Type Description

bigframes.dataframe.DataFrame DataFrame containing following columns: principal_component_id: An integer that identifies the principal component. explained_variance_ratio: the total variance is the sum of variances, also known as eigenvalues, of all of the individual principal components. The explained variance ratio by a principal component is the ratio between the variance, also known as eigenvalue, of that principal component and the total variance.

Methods

repr

__repr__()

Print the estimator's constructor with all non-default parameter values

detect_anomalies

detect_anomalies(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    contamination: float = 0.1
) -> bigframes.dataframe.DataFrame

Detect the anomaly data points of the input.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` Series or a DataFrame to detect anomalies.
`contamination`	`float, default 0.1` Identifies the proportion of anomalies in the training dataset that are used to create the model. The value must be in the range [0, 0.5].

Returns
Type	Description
`bigframes.dataframe.DataFrame`	detected DataFrame.

fit

fit(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y: typing.Optional[
        typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
    ] = None,
) -> bigframes.ml.base._T

Fit the model according to the given training data.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` Series or DataFrame of shape (n_samples, n_features). Training vector, where `n_samples` is the number of samples and `n_features` is the number of features.
`y`	`default None` Ignored.

Returns
Type	Description
`PCA`	Fitted estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name	Description
`deep`	`bool, default True` Default `True`. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type	Description
`Dictionary`	A dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]
) -> bigframes.dataframe.DataFrame

Predict the closest cluster for each sample in X.

Parameter
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` Series or a DataFrame to predict.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	predicted DataFrames.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

After register, go to Google Cloud Console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name	Description
`vertex_ai_model_id`	`Optional[str], default None` optional string id as model id in Vertex. If not set, will by default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(X=None, y=None) -> bigframes.dataframe.DataFrame

Calculate evaluation metrics of the model.

Parameters
Name	Description
`X`	`default None` Ignored.
`y`	`default None` Ignored.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame that represents model metrics.

to_gbq

to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.decomposition.PCA

Save the model to BigQuery.

Parameters
Name	Description
`model_name`	`str` the name of the model.
`replace`	`bool, default False` whether to replace if the model already exists. Default to False.

Returns
Type	Description
`PCA`	saved model.