Package metrics (0.26.0)

API documentation for metrics package.

Modules

pairwise

API documentation for pairwise module.

Packages Functions

accuracy_score

accuracy_score(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    normalize=True
) -> float

Accuracy classification score.

Parameters
Name	Description
`y_true`	Ground truth (correct) labels.
`y_pred`	Predicted labels, as returned by a classifier.
`normalize`	Default to True. If `False`, return the number of correctly classified samples. Otherwise, return the fraction of correctly classified samples.

auc

auc(
    x: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
) -> float

Compute Area Under the Curve (AUC) using the trapezoidal rule.

This is a general function, given points on a curve. For computing the area under the ROC-curve, see roc_auc_score. For an alternative way to summarize a precision-recall curve, see average_precision_score.

Parameters
Name	Description
`x`	X coordinates. These must be either monotonic increasing or monotonic decreasing.
`y`	Y coordinates.

confusion_matrix

confusion_matrix(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
) -> pandas.core.frame.DataFrame

Compute confusion matrix to evaluate the accuracy of a classification.

By definition a confusion matrix :math:C is such that :math:C_{i, j} is equal to the number of observations known to be in group :math:i and predicted to be in group :math:j.

Thus in binary classification, the count of true negatives is :math:C_{0,0}, false negatives is :math:C_{1,0}, true positives is :math:C_{1,1} and false positives is :math:C_{0,1}.

Parameters
Name	Description
`y_true`	Ground truth (correct) target values.
`y_pred`	Estimated targets as returned by a classifier.

f1_score

f1_score(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    average: str = "binary"
) -> pandas.core.series.Series

Compute the F1 score, also known as balanced F-score or F-measure.

The F1 score can be interpreted as a harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall).

In the multi-class and multi-label case, this is the average of the F1 score of each class with weighting depending on the average parameter.

Parameters
Name	Description
`y_true`	Series or DataFrame of shape (n_samples,) Ground truth (correct) target values.
`y_pred`	Series or DataFrame of shape (n_samples,) Estimated targets as returned by a classifier.

precision_score

precision_score(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    average: str = "binary"
) -> pandas.core.series.Series

Compute the precision.

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The best value is 1 and the worst value is 0.

Parameters
Name	Description
`y_true`	Series or DataFrame of shape (n_samples,) Ground truth (correct) target values.
`y_pred`	Series or DataFrame of shape (n_samples,) Estimated targets as returned by a classifier.

r2_score

r2_score(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    force_finite=True
) -> float

:math:R^2 (coefficient of determination) regression score function.

Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). In the general case when the true y is non-constant, a constant model that always predicts the average y disregarding the input features would get a :math:R^2 score of 0.0.

In the particular case when y_true is constant, the :math:R^2 score is not finite: it is either NaN (perfect predictions) or -Inf (imperfect predictions). To prevent such non-finite numbers to pollute higher-level experiments such as a grid search cross-validation, by default these cases are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.ml.metrics
>>> bpd.options.display.progress_bar = None

>>> y_true = bpd.DataFrame([3, -0.5, 2, 7])
>>> y_pred = bpd.DataFrame([2.5, 0.0, 2, 8])
>>> r2_score = bigframes.ml.metrics.r2_score(y_true, y_pred)
>>> r2_score
0.9486081370449679

Parameters
Name	Description
`y_true`	Ground truth (correct) target values.
`y_pred`	Estimated target values.

recall_score

recall_score(
    y_true: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    y_pred: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    *,
    average: str = "binary"
) -> pandas.core.series.Series

Compute the recall.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.