Scores
- enum BinaryLabel(value)[source]
Simple enum for positive and negative classes.
Valid values are as follows:
- pos = <BinaryLabel.pos: 'pos'>
- neg = <BinaryLabel.neg: 'neg'>
- class BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None)[source]
Bootstrap configuration for creating bootstrap samples and computing CIs.
- Parameters:
nb_samples (int) – Number of samples to use for bootstrapping.
bootstrap_method (str) –
Method to compute the CI from the bootstrap samples. Possible values are
”quantile” uses the alpha/2 and 1-alpha/2 quantiles of the empirical metric distribution.
”bc” applies bias correction to correct for the bias of the median of the empirical distribution
”bca” applies bias correction and acceleration to correct for non- constant standard error.
See Ch. 11 of Computer Age Statistical Inference by Efron and Hastie for details.
sampling_method (Union[str, Callable[[Scores], Scores]]) –
Sampling method to create bootstrap sample. Supported methods are
”replacement” creates a sample with the same number of positive and negative scores using sampling with replacement.
- ”single_pass” approximates replacement sampling using a Poisson
distribution to determine how often each score would be selected in the bootstrap sample. This speeds up sampling by up to ~40% compared to replacement sampling since the sampled scores are already sorted. However, the method cannot guarantee that each bootstrap sample has the same number of scores. This should not matter if the number of scores is >100 (per group, if stratified sampling is used).
- ”dynamic” chooses between replacement and single pass sampling. It
chooses single pass sampling, if >100 scores are present per group and smoothing is disabled and reverts to replacement sampling otherwise. The threshold can be changed by setting the variable
SINGLE_PASS_SAMPLE_THRESHOLD.
”proportion” creates a sample of size defined by ratio using sampling without replacement. This is similar to cross-validation, where a proportion of data is used in each iteration.
A callable with signature:
method(source: Scores, **kwargs) -> Scores
creating one sample from a source Scores object.
stratified_sampling (Optional[str]) –
Stratified sampling is only supported for replacement sampling. Possible values are
None. No stratification is used”by_label”. Sampling preserves the proportion of positive and negative samples as well as the proportion of easy positive and negative samples.
”by_group”. Sampling preserves the proportion of samples in each group. Defaults to non-stratified sampling, if no groups are present.
smoothing (bool) – Optional smoothing of sampled scores.
ratio (Optional[float]) – Size of sample when using proportional sampling. In range (0, 1).
- class Scores(pos, neg, *, nb_easy_pos=0, nb_easy_neg=0, score_class='pos', equal_class='pos', is_sorted=False)[source]
- Parameters:
pos – Scores for positive samples.
neg – Scores for negative samples.
nb_easy_pos (int) – Number of positive samples that we assume are always correctly classified when computing metrics. These parameters when evaluating a highly accurate classifier on only the hardest samples to speed up evaluation.
nb_easy_neg (int) – Number of negative samples that we assume are always correctly classified.
score_class (Union[BinaryLabel, str]) – Do scores indicate membership of the positive or the negative class?
equal_class (Union[BinaryLabel, str]) – Do samples with score equal to the threshold get assigned to the positive or negative class?
is_sorted (bool) – If True, we assume the scores are already sorted. Can be used to speed up Scores object creation.
- auc(lower=0.0, upper=1.0, *, x_axis='fpr', y_axis='tpr')[source]
Computes the (partial) AUC for the given Scores object using the trapezoid integration rule.
- Parameters:
lower (float) – Lower limit of integration.
upper (float) – Upper limit of integration.
x_axis (str) – Metric to plot on x-axis. Defaults to FPR.
y_axis (str) – Metric to plot on y-axis. Defaults to TPR.
- bootstrap_ci(metric, alpha=0.05, config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None), **kwargs)[source]
Calculates the confidence interval with approximate coverage 1-alpha for metric by bootstrapping nb_samples from the positive and negative scores.
- Parameters:
metric (Union[str, Callable]) –
Can be a string indicating a member function of the Scores class or a callable with signature:
metric(sample: Scores, **kwargs) -> Union[float, np.ndarray]
alpha (float) – Significance level. In range (0, 1).
config (BootstrapConfig) – Bootstrap config.
**kwargs – Arguments that are passed to the metric function.
- Returns:
Returns an array of shape (Y, 2) with lower and upper bounds of the CI, for a metric returning shape (Y,).
- Return type:
ndarray
- bootstrap_metric(metric, config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None), **kwargs)[source]
Calculates nb_samples samples of metric using bootstrapping.
- Parameters:
metric (Union[str, Callable]) –
Can be a string indicating a member function of the Scores class or a callable with signature:
metric(sample: Scores, **kwargs) -> np.ndarray
config (BootstrapConfig) – Bootstrap config.
**kwargs – Arguments that are passed to the metric function.
- Returns:
Array of samples from metric. If metric returns arrays of shape (X,), the function will return an array of shape (nb_samples, X).
- Return type:
ndarray
- bootstrap_sample(config=BootstrapConfig(nb_samples=1000, bootstrap_method='bca', sampling_method='dynamic', stratified_sampling=None, smoothing=False, ratio=None))[source]
Creates one bootstrap sample by sampling with the specified configuration.
- Parameters:
config (BootstrapConfig) – Bootstrap configuration.
- Returns:
Scores object with the sampled scores.
- Return type:
- cm(threshold)[source]
Computes confusion matrices at the given threshold(s).
- Parameters:
threshold – Can be a scalar or array-like.
- Returns:
A binary confusion matrix.
- Return type:
- confusion_matrix(threshold)
Computes confusion matrices at the given threshold(s).
- Parameters:
threshold – Can be a scalar or array-like.
- Returns:
A binary confusion matrix.
- Return type:
- eer()[source]
Calculates Equal Error Rate, i.e., where FPR = FNR (or, equivalently, where FAR = FRR).
- Returns:
Tuple
(threshold, eer), consisting of the threshold at which EER is achieved and the EER value itself.- Return type:
Tuple[float, float]
- static from_labels(labels, scores, *, pos_label=1, nb_easy_pos=0, nb_easy_neg=0, score_class='pos', equal_class='pos', is_sorted=False)[source]
- Parameters:
labels – Array with sample labels.
scores – Array with sample scores.
pos_label (Any) – The label of the positive class. All other labels are treated as negative labels.
nb_easy_pos (int) – Number of positive samples that we assume are always correctly classified when computing metrics. These parameters when evaluating a highly accurate classifier on only the hardest samples to speed up evaluation.
nb_easy_neg (int) – Number of negative samples that we assume are always correctly classified.
score_class (Union[BinaryLabel, str]) – Do scores indicate membership of the positive or the negative class?
equal_class (Union[BinaryLabel, str]) – Do samples with score equal to the threshold get assigned to the positive or negative class?
is_sorted (bool) – If True, we assume the scores are already sorted. Can be used to speed up Scores object creation.
- Returns:
A Scores instance.
- Return type:
- swap()[source]
Swaps positive and negative scores. Also reverses the decision logic, so that fpr of original scores equals fnr of reversed scores.
- Returns:
Scores object with positive and negative scores reversed.
- Return type:
- threshold_at_acceptance_rate(acceptance_rate, *, method='linear')[source]
Set threshold at Acceptance Rate
Alias for
threshold_at_topr().- Parameters:
method (str) –
- threshold_at_far(far, *, method='linear')[source]
Set threshold at False Acceptance Rate
Alias for
threshold_at_fpr().- Parameters:
method (str) –
- threshold_at_fnr(fnr, *, method='linear')[source]
Set threshold at False Negative Rate.
- Parameters:
fnr – FNR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_fpr(fpr, *, method='linear')[source]
Set threshold at False Positive Rate.
- Parameters:
fpr – FPR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_frr(frr, *, method='linear')[source]
Set threshold at False Rejection Rate
Alias for
threshold_at_fnr().- Parameters:
method (str) –
- threshold_at_metric(target, metric, points=None)[source]
General function for setting thresholds at arbitrary metrics. No assumption is made about the metric being monotone or the threshold being unique.
Given a metric function and a target value, the function will find all values for the threshold such that
metric(threshold) = target.If
N = len(pos) + len(neg)is the number of scores andT = len(target)is the number of thresholds we want to set, this function has complexity O(N*T), because it searches over the whole score space to find all solutions. We can speed up the function by considering only a subset of points.- Parameters:
target – Target points at which to set the threshold.
metric (Union[str, Callable]) –
Can be a string indicating a member function of the Scores class or a callable with signature:
metric(sample: Scores, threshold: np.ndarray) -> np.ndarray
points (Optional[Union[int, ndarray]]) – If a scalar, we use this many linearly spaced scores between
min(pos, neg)andmax(pos, neg). If given an array, we evaluate the metric at exactly these points.
- Returns:
A list of thresholds of the same length as
target, such thatthreshold[j]is a strictly increasing array containing all solutions of the equationmetric(theta) = target[j].- Return type:
List[ndarray]
- threshold_at_rejection_rate(rejection_rate, *, method='linear')[source]
Set threshold at Rejection Rate
Alias for
threshold_at_tonr().- Parameters:
method (str) –
- threshold_at_tar(tar, *, method='linear')[source]
Set threshold at True Acceptance Rate
Alias for
threshold_at_tpr().- Parameters:
method (str) –
- threshold_at_tnr(tnr, *, method='linear')[source]
Set threshold at True Negative Rate.
- Parameters:
tnr – TNR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_tonr(tonr, *, method='linear')[source]
Set threshold at Test Outcome Negative Rate.
This is the proportion of samples where the test outcome is negative, i.e. the test does not detect the condition.
- Parameters:
tonr – TONR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_topr(topr, *, method='linear')[source]
Set threshold at Test Outcome Positive Rate.
This is the proportion of samples where the test outcome is positive, i.e. the test detects the condition.
- Parameters:
topr – TOPR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_tpr(tpr, *, method='linear')[source]
Set threshold at True Positive Rate.
- Parameters:
tpr – TPR values at which to set threshold.
method (str) – Possible values are “linear”, “lower”, “higher”. If “lower” or “higher”, we return the closest score at which the metric is lower or higher that the target. If “linear”, we apply linear interpolation between the lower and higher values.
- threshold_at_trr(trr, *, method='linear')[source]
Set threshold at True Rejection Rate
Alias for
threshold_at_tnr().- Parameters:
method (str) –