Scores

enum BinaryLabel(value)[source]

Simple enum for positive and negative classes.

Valid values are as follows:

pos = <BinaryLabel.pos: 'pos'>
neg = <BinaryLabel.neg: 'neg'>
class BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None)[source]

Bootstrap configuration for creating bootstrap samples and computing CIs.

Parameters:
  • nb_samples (int) – Number of samples to use for bootstrapping.

  • bootstrap_method (str) –

    Method to compute the CI from the bootstrap samples. Possible values are

    • ”quantile” uses the alpha/2 and 1-alpha/2 quantiles of the empirical metric distribution.

    • ”bc” applies bias correction to correct for the bias of the median of the empirical distribution

    • ”bca” applies bias correction and acceleration to correct for non- constant standard error.

    See Ch. 11 of Computer Age Statistical Inference by Efron and Hastie for details.

  • sampling_method (Union[str, Callable[[Scores], Scores]]) –

    Sampling method to create bootstrap sample. Supported methods are

    • ”replacement” creates a sample with the same number of positive and negative scores using sampling with replacement.

    • ”single_pass” approximates replacement sampling using a Poisson

      distribution to determine how often each score would be selected in the bootstrap sample. This speeds up sampling by up to ~40% compared to replacement sampling since the sampled scores are already sorted. However, the method cannot guarantee that each bootstrap sample has the same number of scores. This should not matter if the number of scores is >100 (per group, if stratified sampling is used).

    • ”dynamic” chooses between replacement and single pass sampling. It

      chooses single pass sampling, if >100 scores are present per group and smoothing is disabled and reverts to replacement sampling otherwise. The threshold can be changed by setting the variable SINGLE_PASS_SAMPLE_THRESHOLD.

    • ”proportion” creates a sample of size defined by ratio using sampling without replacement. This is similar to cross-validation, where a proportion of data is used in each iteration.

    • A callable with signature:

      method(source: Scores, **kwargs) -> Scores
      

      creating one sample from a source Scores object.

  • stratified_sampling (Optional[str]) –

    Stratified sampling is only supported for replacement sampling. Possible values are

    • None. No stratification is used

    • ”by_label”. Sampling preserves the proportion of positive and negative samples as well as the proportion of easy positive and negative samples.

    • ”by_group”. Sampling preserves the proportion of samples in each group. Defaults to non-stratified sampling, if no groups are present.

  • smoothing (bool) – Optional smoothing of sampled scores.

  • ratio (Optional[float]) – Size of sample when using proportional sampling. In range (0, 1).

class Scores(pos, neg, *, nb_easy_pos=0, nb_easy_neg=0, score_class='pos', equal_class='pos', is_sorted=False)[source]
Parameters:
  • pos – Scores for positive samples.

  • neg – Scores for negative samples.

  • nb_easy_pos (int) – Number of positive samples that we assume are always correctly classified when computing metrics. These parameters when evaluating a highly accurate classifier on only the hardest samples to speed up evaluation.

  • nb_easy_neg (int) – Number of negative samples that we assume are always correctly classified.

  • score_class (Union[BinaryLabel, str]) – Do scores indicate membership of the positive or the negative class?

  • equal_class (Union[BinaryLabel, str]) – Do samples with score equal to the threshold get assigned to the positive or negative class?

  • is_sorted (bool) – If True, we assume the scores are already sorted. Can be used to speed up Scores object creation.

acceptance_rate(threshold)[source]

Acceptance Rate at threshold(s).

Alias for topr().

auc(lower=0.0, upper=1.0, *, x_axis='fpr', y_axis='tpr')[source]

Computes the (partial) AUC for the given Scores object using the trapezoid integration rule.

Parameters:
  • lower (float) – Lower limit of integration.

  • upper (float) – Upper limit of integration.

  • x_axis (str) – Metric to plot on x-axis. Defaults to FPR.

  • y_axis (str) – Metric to plot on y-axis. Defaults to TPR.

bootstrap_ci(metric, alpha=0.05, config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None), **kwargs)[source]

Calculates the confidence interval with approximate coverage 1-alpha for metric by bootstrapping nb_samples from the positive and negative scores.

Parameters:
  • metric (Union[str, Callable]) –

    Can be a string indicating a member function of the Scores class or a callable with signature:

    metric(sample: Scores, **kwargs) -> Union[float, np.ndarray]
    

  • alpha (float) – Significance level. In range (0, 1).

  • config (BootstrapConfig) – Bootstrap config.

  • **kwargs – Arguments that are passed to the metric function.

Returns:

Returns an array of shape (Y, 2) with lower and upper bounds of the CI, for a metric returning shape (Y,).

Return type:

ndarray

bootstrap_metric(metric, config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None), **kwargs)[source]

Calculates nb_samples samples of metric using bootstrapping.

Parameters:
  • metric (Union[str, Callable]) –

    Can be a string indicating a member function of the Scores class or a callable with signature:

    metric(sample: Scores, **kwargs) -> np.ndarray
    

  • config (BootstrapConfig) – Bootstrap config.

  • **kwargs – Arguments that are passed to the metric function.

Returns:

Array of samples from metric. If metric returns arrays of shape (X,), the function will return an array of shape (nb_samples, X).

Return type:

ndarray

bootstrap_sample(config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None))[source]

Creates one bootstrap sample by sampling with the specified configuration.

Parameters:

config (BootstrapConfig) – Bootstrap configuration.

Returns:

Scores object with the sampled scores.

Return type:

Scores

cm(threshold)[source]

Computes confusion matrices at the given threshold(s).

Parameters:

threshold – Can be a scalar or array-like.

Returns:

A binary confusion matrix.

Return type:

ConfusionMatrix

confusion_matrix(threshold)

Computes confusion matrices at the given threshold(s).

Parameters:

threshold – Can be a scalar or array-like.

Returns:

A binary confusion matrix.

Return type:

ConfusionMatrix

eer()[source]

Calculates Equal Error Rate, i.e., where FPR = FNR (or, equivalently, where FAR = FRR).

Returns:

Tuple (threshold, eer), consisting of the threshold at which EER is achieved and the EER value itself.

Return type:

Tuple[float, float]

far(threshold)[source]

False Acceptance Rate at threshold(s).

Alias for fpr().

fnr(threshold)[source]

False Negative Rate at threshold(s).

fpr(threshold)[source]

False Positive Rate at threshold(s).

static from_labels(labels, scores, *, pos_label=1, nb_easy_pos=0, nb_easy_neg=0, score_class='pos', equal_class='pos', is_sorted=False)[source]
Parameters:
  • labels – Array with sample labels.

  • scores – Array with sample scores.

  • pos_label (Any) – The label of the positive class. All other labels are treated as negative labels.

  • nb_easy_pos (int) – Number of positive samples that we assume are always correctly classified when computing metrics. These parameters when evaluating a highly accurate classifier on only the hardest samples to speed up evaluation.

  • nb_easy_neg (int) – Number of negative samples that we assume are always correctly classified.

  • score_class (Union[BinaryLabel, str]) – Do scores indicate membership of the positive or the negative class?

  • equal_class (Union[BinaryLabel, str]) – Do samples with score equal to the threshold get assigned to the positive or negative class?

  • is_sorted (bool) – If True, we assume the scores are already sorted. Can be used to speed up Scores object creation.

Returns:

A Scores instance.

Return type:

Scores

frr(threshold)[source]

False Rejection Rate at threshold(s).

Alias for fnr().

rejection_rate(threshold)[source]

Rejection Rate at threshold(s).

Alias for tonr().

swap()[source]

Swaps positive and negative scores. Also reverses the decision logic, so that fpr of original scores equals fnr of reversed scores.

Returns:

Scores object with positive and negative scores reversed.

Return type:

Scores

tar(threshold)[source]

True Acceptance Rate at threshold(s).

Alias for tpr().

threshold_at_acceptance_rate(acceptance_rate, *, method='linear')[source]

Set threshold at Acceptance Rate

Alias for threshold_at_topr().

Parameters:

method (str) –

threshold_at_far(far, *, method='linear')[source]

Set threshold at False Acceptance Rate

Alias for threshold_at_fpr().

Parameters:

method (str) –

threshold_at_fnr(fnr, *, method='linear')[source]

Set threshold at False Negative Rate.

Parameters:
  • fnr – FNR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_fpr(fpr, *, method='linear')[source]

Set threshold at False Positive Rate.

Parameters:
  • fpr – FPR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_frr(frr, *, method='linear')[source]

Set threshold at False Rejection Rate

Alias for threshold_at_fnr().

Parameters:

method (str) –

threshold_at_metric(target, metric, points=None)[source]

General function for setting thresholds at arbitrary metrics. No assumption is made about the metric being monotone or the threshold being unique.

Given a metric function and a target value, the function will find all values for the threshold such that metric(threshold) = target.

If N = len(pos) + len(neg) is the number of scores and T = len(target) is the number of thresholds we want to set, this function has complexity O(N*T), because it searches over the whole score space to find all solutions. We can speed up the function by considering only a subset of points.

Parameters:
  • target – Target points at which to set the threshold.

  • metric (Union[str, Callable]) –

    Can be a string indicating a member function of the Scores class or a callable with signature:

    metric(sample: Scores, threshold: np.ndarray) -> np.ndarray
    

  • points (Optional[Union[int, ndarray]]) – If a scalar, we use this many linearly spaced scores between min(pos, neg) and max(pos, neg). If given an array, we evaluate the metric at exactly these points.

Returns:

A list of thresholds of the same length as target, such that threshold[j] is a strictly increasing array containing all solutions of the equation metric(theta) = target[j].

Return type:

List[ndarray]

threshold_at_rejection_rate(rejection_rate, *, method='linear')[source]

Set threshold at Rejection Rate

Alias for threshold_at_tonr().

Parameters:

method (str) –

threshold_at_tar(tar, *, method='linear')[source]

Set threshold at True Acceptance Rate

Alias for threshold_at_tpr().

Parameters:

method (str) –

threshold_at_tnr(tnr, *, method='linear')[source]

Set threshold at True Negative Rate.

Parameters:
  • tnr – TNR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_tonr(tonr, *, method='linear')[source]

Set threshold at Test Outcome Negative Rate.

This is the proportion of samples where the test outcome is negative, i.e. the test does not detect the condition.

Parameters:
  • tonr – TONR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_topr(topr, *, method='linear')[source]

Set threshold at Test Outcome Positive Rate.

This is the proportion of samples where the test outcome is positive, i.e. the test detects the condition.

Parameters:
  • topr – TOPR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_tpr(tpr, *, method='linear')[source]

Set threshold at True Positive Rate.

Parameters:
  • tpr – TPR values at which to set threshold.

  • method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.

threshold_at_trr(trr, *, method='linear')[source]

Set threshold at True Rejection Rate

Alias for threshold_at_tnr().

Parameters:

method (str) –

tnr(threshold)[source]

True Negative Rate at threshold(s).

tonr(threshold)[source]

Test Outcome Negative Rate at threshold(s).

topr(threshold)[source]

Test Outcome Positive Rate at threshold(s).

tpr(threshold)[source]

True Positive Rate at threshold(s).

trr(threshold)[source]

True Rejection Rate at threshold(s).

Alias for tnr().