API Reference

Documentation for the IAIBase public interface.

Index

Data Preparation

IAI.FeatureInputType

Permissible types for specifying the feature data.

The features can be supplied as Matrix of Reals or a DataFrame as follows:

  • numeric features are specified using numeric vectors
  • categoric and ordinal features are specified using CategoricalVectors
  • mixed features are specified using vectors of MixedDatum (see make_mixed_data)
  • missing values are specified using missing

For more details, refer to the data preparation guide in the manual.

IAI.MixedDatumType
MixedDatum{T}

Represents a mixed feature value that can either be categoric or of type T.

The value has the following fields:

  • iscat::Bool: true if the value is categoric
  • value_cat: the value if categoric
  • value_else: the value if non-categoric

Mixed features are specified in the data using vectors of MixedDatum. It is recommended to create and work with these vectors of MixedDatum values via make_mixed_data and undo_mixed_data.

IAI.make_mixed_dataFunction
make_mixed_data(input)

Construct a vector of mixed categoric and numeric data from input. All numeric values from input are treated as numeric data, and all remaining values are treated as categoric data.

Examples

Construct a mixed data vector with a numeric score and two additional levels ("Unknown" and "NA")

IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])
6-element Vector{Union{Missing, NumericMixedDatum}}:
 13.0
 "Unknown"
 "NA"
 2.0
 4.0
 missing
make_mixed_data(input, ordinal_levels)

Construct a vector of mixed categoric and ordinal data from input. All values from input that are in ordinal_levels are treated as ordinal data with the ordering indicated by the order of ordinal_levels, and all remaining values are treated as categoric data.

Examples

Construct a mixed data vector with three ordered levels (A < B < C) and two additional levels ("Unknown" and "NA")

IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing], ["A", "B", "C"])
6-element Vector{Union{Missing, OrdinalMixedDatum{String}}}:
 "B"
 "Unknown"
 "NA"
 "C"
 "A"
 missing
IAI.undo_mixed_dataFunction
undo_mixed_data(mixed_data)

Convert an vector of mixed data back to a normal Vector with mixed types.

Examples

Undo the conversion to numeric mixed data vector

numeric_mixed = IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])
IAI.undo_mixed_data(numeric_mixed)
6-element Vector{Any}:
 13.0
   "Unknown"
   "NA"
  2.0
  4.0
   missing

Undo the conversion to ordinal mixed data vector

ordinal_mixed = IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing],
                                    ["A", "B", "C"])
IAI.undo_mixed_data(ordinal_mixed)
6-element Vector{Union{Missing, String}}:
 "B"
 "Unknown"
 "NA"
 "C"
 "A"
 missing
IAI.TargetInputType

Permissible types for specifying the problem target. The number and types of the target arguments depend on the problem type (for more information, refer to the data preparation guide in the manual):

Classification

  • y: AbstractVector of class labels

Regression

  • y: AbstractVector of numeric values

Prescription

  • treatments: AbstractVector of treatment labels
  • outcomes: AbstractVector of numeric outcomes

Survival

  • deaths: AbstractVector{Bool} indicating which observations are deaths
  • times: AbstractVector of times for each observation

Imputation

No target required

IAI.MultiTargetInputType

Permissible types for specifying the problem target for multi-task problems.

Multi-task targets are supplied either as an AbstractMatrix or AbstractDataFrame, where the columns contain the targets for each task.

IAI.SampleWeightInputType

Permissible types for specifying sample weights:

  • nothing (default) will assign equal weight to all points
  • Vector or StatsBase.Weights of the weights for each point

Additionally for problems with discrete outcomes (classification/prescription):

  • Dict giving the weight for each label
  • :autobalance to use weights that give each label equal weight

For more information, refer to the data preparation guide in the manual.

IAI.RelativeParameterInputType

Permissible types for specifying parameters relative to the number of samples or features:

  • :all: allowed to use all
  • a non-negative Integer: the value to be used
  • a Real between 0 and 1: use this proportion of the total number
  • :sqrt: use the square root of the total number
  • :log2: use the base-2 logarithm of the total number
IAI.split_dataFunction
split_data(task::Symbol, X::FeatureInput, y::TargetInput...;
           keyword_arguments...)

Split the data (X and y) into a tuple of training and testing data: (X_train, y_train...), (X_test, y_test...).

The mechanism used to split the data is determined by task:

  • Stratified:
    • :classification gives a stratified split on the class labels
    • :prescription or :policy_categorical gives a stratified split on the treatments
  • Non-stratified:
    • :regression, :survival, :policy_numeric gives randomly split data

Keyword Arguments

  • train_proportion=0.7: proportion of data in training set
  • shuffle=true: whether the data is shuffled before splitting. Disabling shuffling can be useful if the training/testing split should be based on the order of the data (e.g. if the data is sorted by time). If shuffle=false, the split will not be stratified even if the task defaults to a stratified approach
  • seed=nothing: random seed for splitting, uses the global random state if nothing is specified

Examples

Classification:

X = [1 2; 3 4; 5 6; 7 8]
y = ["A", "B", "A", "B"]
(train_X, train_y), (test_X, test_y) =
    IAI.split_data(:classification, X, y, seed=1)

Regression:

X = [1 2; 3 4; 5 6; 7 8]
y = [0.1, 0.2, 0.3, 0.4]
(train_X, train_y), (test_X, test_y) =
    IAI.split_data(:regression, X, y, seed=1)

Survival:

X = [1 2; 3 4; 5 6; 7 8]
deaths = [true, false, true, false]
times = [1, 2, 3, 4]
(train_X, train_deaths, train_times), (test_X, test_deaths, test_times) =
    IAI.split_data(:survival, X, deaths, times, seed=1)

Prescription:

X = [1 2; 3 4; 5 6; 7 8]
treatments = ["A", "B", "A", "B"]
outcomes = [0.1, 0.2, 0.3, 0.4]
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
    IAI.split_data(:prescription, X, treatments, outcomes, seed=1)
split_data(task::Symbol, inds::AbstractRange, y::TargetInput...;
           keyword_arguments...)

Split a range of indices (inds) into sets of indices for training and testing.

The mechanism used to split the data is determined by task and y as in split_data, and the same keyword arguments are supported.

Examples

y = ["A", "B", "A", "B"]
train_inds, test_inds = IAI.split_data(:classification, 1:length(y), y, seed=1)
([1, 2], [3, 4])
split_data(task::Symbol, n::Integer, y::TargetInput...;
           keyword_arguments...)

Construct sets of indices for training and testing for data with n samples.

Same as split_data with the range 1:n.

Examples

y = ["A", "B", "A", "B"]
train_inds, test_inds = IAI.split_data(:classification, length(y), y, seed=1)
([1, 2], [3, 4])

Learners

IAI.FeatureSetType

Permissible types for specifying set of features in a dataframe. Refer to Indexing in DataFrames.jl for a full list of supported rules:

Input TypeDescriptionExamples
AllUse all columnsAll()
Integer or a vector of IntegersSpecify indices of columns to use1, [1, 3, 4]
Symbol or a vector of SymbolsSpecify names of columns to use:x1, [:x1, :x3]
String or a vector of StringsSpecify names of columns to use"x1", ["x1", "x3"]
NotSpecify columns not to useNot(1), Not(["x2", "x4"])
BetweenSpecify range of columns to useBetween("x1", "x4")
IAI.FeatureMappingType

Permissible types for specifying a mapping from features to values. One or more mappings from FeatureSet to value can be supplied, with each mapping in one of the following formats:

  • a Pair from key to value, e.g. "A" => 1
  • a Tuple containing key and value, e.g. ("A", 1)
  • a Vector containing key and value, e.g. ["A", 1]

If there is only one such mapping, it can be supplied directly. If more than one mapping is present, they can be supplied in one of the following containers:

  • a Dict with entries for each mapping, e.g. Dict("A" => 1, "B" => 2)
  • a NamedTuple with entries for each mapping, e.g. (A=1, B=2)
  • a Tuple containing each mapping, e.g. ("A" => 1, "B" => 2)
  • a Vector containing each mapping, e.g. ["A" => 1, "B" => 2]

Fitting

IAI.fit!Method
fit!(lnr::Learner, X::FeatureInput, y::TargetInput...;
     sample_weight::SampleWeightInput=nothing)

Fits a model using the parameters in lnr and the data X and y.

Evaluation (for supervised learners only)

IAI.predictMethod
predict(lnr::SupervisedLearner, X::FeatureInput)

Return the predictions made by the trained model in lnr for each point in the data X.

IAI.scoreMethod
score(lnr::SupervisedLearner, X::FeatureInput, y::TargetInput...;
      keyword_arguments...)

Calculates the score for lnr on data X and y. By default, all scores are calibrated such that higher is better (and 1 is the maximum possible score).

Keyword Arguments

  • sample_weight::SampleWeightInput=nothing: the weighting to give to each data point.
  • criterion=:default: the scoring criterion to use when evaluating the score (refer to the documentation on scoring criteria for more information). Uses the criterion in lnr if left as :default.
  • scale::Bool=true: whether to scale the score so that higher is better and 1 is the maximum possible.
  • extra keyword arguments are passed through to configure the specified scoring criterion (e.g. tweedie_variance_power for :tweedie)

Utilities

IAI.write_jsonFunction
write_json(f, obj; keyword_arguments...)

Write obj (can be a Learner or GridSearch) to f in JSON format.

Keyword Arguments

  • indent=2: level of indentation in JSON, or set to nothing to disable.
IAI.read_jsonFunction
read_json(f)

Read in a Learner or GridSearch saved in JSON format from f, which can be either a filepath from which to read the saved JSON, or a dictionary containing already-parsed JSON data.

IAI.variable_importanceMethod
variable_importance(lnr::Learner; keyword_arguments...)

Generate a ranking of the variables in lnr according to their importance during training. The results are normalized so that they sum to one.

For linear models (e.g. linear/logistic regression, hyperplane splits) the importance is determined using the coefficients in the resulting model after scaling to account for features of different magnitudes.

IAI.get_features_usedFunction
get_features_used(lnr::Learner)

Return a Vector of Symbols for feature names used by the lnr.

IAI.get_paramsFunction
get_params!(lnr::Learner)

Return a Dict containing the values of user-specified parameters in lnr.

IAI.set_params!Function
set_params!(lnr::Learner; params...)

Update user-specified parameters in lnr with all supplied key-value pairs in params.

IAI.cloneFunction
clone(lnr::Learner)
clone(grid::GridSearch)

Return an unfitted copy of lnr or grid with the same user-specified parameters.

Visualization in Browser

IAI.write_htmlMethod
write_html(f, vis::AbstractVisualization; keyword_arguments...)

Generic function for saving a visualization vis to f in HTML format.

IAI.show_in_browserMethod
show_in_browser(vis::AbstractVisualization; keyword_arguments...)

Generic function for showing a visualization vis in the browser.

IAI.QuestionnaireMethod
Questionnaire(lnr::Learner; keyword_arguments...)

Abstract type encompassing objects related to interactive questionnaires. Examples include:

IAI.MultiQuestionnaireMethod
MultiQuestionnaire(grid::GridSearch; keyword_arguments...)

Constructs an interactive questionnaire containing the final fitted learner as well as the learner found for each parameter combination.

IAI.make_html_tableMethod
make_html_table(df::DataFrame)

Return a string representing df as an HTML table.

Grid Search and Parameter Validation

IAI.GridSearchType
GridSearch(lnr::Learner, param_grid)

Controls grid search over parameter combinations in param_grid to find the best combination of parameters for lnr.

lnr is a learner with any parameters that should be included in all combinations of parameters tested.

param_grid contains the parameter ranges to search over. These can be supplied in multiple ways, which we demonstrate with examples that create identical GridSearchs to tune lnr over the parameters criterion and normalize_X:

  • one or more keyword arguments to the GridSearch constructor containing key=value pairs for all desired parameters and their ranges:

    IAI.GridSearch(lnr, criterion=[:gini, :entropy], normalize_X=[true, false])
  • a Dict or NamedTuple where the keys are the names of the parameters to tune, and the corresponding values are the range over which to vary each parameter:

    IAI.GridSearch(lnr, Dict(:criterion => [:gini, :entropy],
                             :normalize_X => [true, false]))
    IAI.GridSearch(lnr, (criterion=[:gini, :entropy], normalize_X=[true, false]))
  • a Vector{Dict} or Vector{NamedTuple} where each entry specifies a grid of parameters to test (refer to the documentation on multiple parameter grids):

    IAI.GridSearch(lnr, [
        (criterion=:gini,    normalize_X=[true, false]),
        (criterion=:entropy, normalize_X=[true, false]),
    ])

Keyword Arguments

  • train_proportion: if specified, fit! will use a single train-validation split with this proportion of the data as training. Defaults to nothing.
  • n_folds: if specified, fit! will use cross-validation with this number of folds, rather than a single split. Defaults to nothing.
  • validation_criterion: if specified, will be used as the validation criterion during grid search when validation_criterion is not passed explicitly as a keyword argument. Defaults to :default, meaning the criterion from the learner in the grid search will be used.
  • fit_kwargs: additional keyword arguments to pass by default to fit!/fit_cv, such as keyword arguments used to configure validation_criterion (e.g. positive_label for certain classification criteria)
IAI.fit!Method
fit!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
     keyword_arguments...)

Fit a grid with data X and y... by randomly splitting the data into training and validation sets in the same way as split_data.

Keyword Arguments

  • train_proportion::Real: a number between 0 and 1 indicating the proportion of data to use in training. If not specified, will default first to any value specified when creating grid, and then to 0.7. If no value is provided for train_proportion but n_folds was specified when creating grid, then cross-validation with n_folds will be used (see fit_cv!).
  • sample_weight::SampleWeightInput=nothing: the weighting to give to each data point.
  • validation_criterion::Symbol=:default: the scoring criterion that should be used to evaluate the parameter combinations to determine which is best (refer to the documentation on scoring criteria for more information). If left as :default, will default first to any value specified when creating grid, and then to the value of criterion in the learner used by grid.
  • run_gc::Bool=false: if true, runs the Julia garbage collector between each parameter combination to reclaim unused any memory immediately. This is usually unnecessary as Julia will automatically run garbage collection itself as needed.
  • verbose::Bool=false: if true, prints out the score for each parameter combination during the grid search. Can only be set to true if the learner in the grid search has show_progress set to false.
  • extra keyword arguments are passed through to configure the specified scoring criterion (e.g. tweedie_variance_power for :tweedie)

fit!(grid::GridSearch, train_X::FeatureInput, train_y::TargetInput...,
     valid_X::FeatureInput, valid_y::TargetInput...; keyword_arguments...)

Fit a grid with explicit training and validation sets.

Supports the same keyword arguments as above with the exception of train_proportion as the data has already been split. sample_weight additionally accepts a Tuple of sample weight vectors if you would like to specify explicit weight vectors for the training and validation sets.

IAI.fit_cv!Method
fit_cv!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
        keyword_arguments...)

Fit a grid with data X and y... using k-fold cross-validation.

The keyword arguments are the same as for fitting the grid with randomly split data using IAI.fit!, except the train_proportion argument is replaced by n_folds, which indicates the number of folds to use in the cross-validation (defaulting first to any value of n_folds specified when creating grid, and then to 5).

IAI.get_learnerFunction
get_learner(grid::GridSearch)

Return the final fitted learner using the best parameter combination from the grid.

IAI.get_best_paramsFunction
get_best_params(grid::GridSearch)

Return the best parameter combination from the grid.

Examples

Example output from a GridSearch used to tune an OptimalTreeClassifier:

IAI.get_best_params(grid)
Dict{Symbol, Any} with 2 entries:
  :cp        => 0.0357143
  :max_depth => 3
IAI.get_grid_result_summaryFunction
get_grid_result_summary(grid::GridSearch)

Return a DataFrame summarizing the results from the grid search.

Each row corresponds to a single parameter combination from the grid search, and contains:

  • the value of each parameter
  • the training and validation scores of the learner trained using these parameters
  • the rank of this parameter combination according to the validation score (where a rank of 1 indicates the best parameter combination)

When fitting the grid using cross-validation, the training and validation scores for each fold are shown, along with the mean and standard deviation of these scores.

Examples

Example output from a GridSearch used to tune an OptimalTreeClassifier:

IAI.get_grid_result_summary(grid)
3×5 DataFrame
 Row │ max_depth  cp         train_score  valid_score  rank_valid_score
     │ Int64      Float64    Float64      Float64      Int64
─────┼──────────────────────────────────────────────────────────────────
   1 │         1  0.25          0.666667     0.666667                 3
   2 │         2  0.228571      0.971429     0.911111                 2
   3 │         3  0.0357143     0.980952     0.915556                 1
IAI.get_grid_result_detailsFunction
get_grid_result_details(grid::GridSearch)

Return a Vector of Dicts corresponding to each combination of parameters tested in the grid search, where each Dict contains the following entries:

  • :params: a Dict of parameter values for this combination
  • :valid_score: the validation score for this parameter combination (for cross-validation, this is the mean validation score across all folds)
  • :rank: the rank of this parameter combination based on the validation score, where lower is better
  • :fold_results: a Vector containing a Dict for each fold in the grid search (for grids without cross-validation, there is a single fold corresponding to the training/validation sets). Each Dict contains the following entries:
    • :train_score: the training score for the fold
    • :valid_score: the validation score for the fold
    • :learner: the trained learner in the fold
    as well as other result fields for specific learner types.

Task-specific Functions

These functions are only available to learners of the appropriate type for the problem.

Classification

IAI.predict_probaMethod
predict_proba(lnr::ClassificationLearner, X::FeatureInput)

Return the probabilities of class membership predicted by the trained model in lnr for each point in the features X.

IAI.ROCCurveType

Container for ROC curve information.

The data underlying the curve can be extracted with get_roc_curve_data.

The resulting curve can be visualized in the browser using show_in_browser, or with write_html to save the visualization in HTML format. You can also use plot from Plots.jl to visualize the curve.

IAI.get_roc_curve_dataFunction
get_roc_curve_data(curve::ROCCurve)

Extract the underlying data from curve as a Dict with two keys:

  • :coords: a Vector of Dicts representing the points on the curve. Each Dict contains the following keys:
    • :fpr: false positive rate at the given threshold
    • :tpr: true positive rate at the given threshold
    • :threshold: the threshold
  • :auc: the area-under-the-curve (AUC)
IAI.ROCCurveMethod
ROCCurve(lnr::ClassificationLearner, X::FeatureInput, y::AbstractVector;
         positive_label)

Construct a ROCCurve using trained lnr on the features X and labels y, treating positive_label as the positive label.

Info

Can only be applied to classification problems with $K=2$ classes.

IAI.write_htmlMethod
write_html(f, roc::ROCCurve)

Write interactive browser visualization of roc to f in HTML format.

IAI.show_in_browserMethod
show_in_browser(roc::ROCCurve)

Display an interactive visualization of roc in the browser.

Regression

Survival

IAI.predict_hazardMethod
predict_hazard(lnr::SurvivalLearner, X::FeatureInput)

Return the fitted hazard coefficient estimate made by the trained model in lnr for each point in the data X. A higher hazard coefficient estimate corresponds to a smaller predicted survival time.

IAI.predict_expected_survival_timeMethod
predict_expected_survival_time(lnr::SurvivalLearner, X::FeatureInput)

Return the expected survival time according to the trained model in lnr for each point in the data X.

IAI.predictMethod
predict(lnr::SurvivalLearner, X::FeatureInput)

Return the SurvivalCurve predicted by the trained model in lnr for each point in the data X.


predict(lnr::SurvivalLearner, X::FeatureInput; t::Number)

Return the probability that death occurs at or before time t, as predicted by the trained model in lnr for each point in the data X.

IAI.SurvivalCurveType

Container for survival curve information.

Use curve[t] to get the mortality probability prediction from curve at time t. This returns the cumulative distribution function evaluated at time t, i.e., the probability that death occurs at or before time t.

The data underlying the curve can be extracted with get_survival_curve_data.

IAI.get_survival_curve_dataFunction
get_survival_curve_data(curve::SurvivalCurve)

Extract the underlying data from curve as a Dict with three keys:

  • :times: the time for each breakpoint on the curve
  • :coefs: the mortality probablility for each breakpoint on the curve
  • :expected_time: the expected survival time
  • :linear: whether the curve coefficients come from a smoothed linear interpolation (see smooth_survival_curves for more information)

Prescription

IAI.predict_outcomesMethod
predict_outcomes(lnr::PrescriptionLearner, X::FeatureInput)

Return a DataFrame containing the predicted outcome for each treatment option made by the trained model in lnr for each point in the features X.

Policy

IAI.predict_outcomesMethod
predict_outcomes(lnr::PolicyLearner, X::FeatureInput, rewards::FeatureInput)

Return the outcome from rewards for each point in the features X under the prescriptions made by the trained model in lnr.

IAI.predict_treatment_rankFunction
predict_treatment_rank(lnr::PolicyLearner, X::FeatureInput)

Return a Matrix containing the treatments in ranked order of effectiveness for each point in the features X as predicted the trained model in lnr. For example, the first column contains the best treatment for each point, the second column contains the second-best treatment, and so on.

IAI.predict_treatment_outcomeFunction
predict_treatment_outcome(lnr::PolicyLearner, X::FeatureInput)

Return a DataFrame containing the estimated quality of each treatment in the trained model from lnr for each point in the features X. These quality estimates are the values used by the model to determine the treatment ranks in predict_treatment_rank and are based on aggregate statistics. For an individualized prediction of outcomes under the model prescription policy, use predict_outcomes instead.

IAI.predict_treatment_outcome_standard_errorFunction
predict_treatment_outcome_standard_error(lnr::PolicyLearner,
                                         X::FeatureInput)

Return a DataFrame containing the standard error for the estimated quality of each treatment in the trained model from lnr for each point in the features X. These errors can be used to form confidence intervals around results from predict_treatment_outcome

Imputation

IAI.transformFunction
transform(lnr::ImputationLearner, X::FeatureInput)

Return a DataFrame containing the features X with all missing values imputed by the fitted imputation model in lnr.

IAI.fit_transform!Function
fit_transform!(lnr::ImputationLearner, X::FeatureInput; kwargs...)

Fit lnr with an imputation model on features X and return a DataFrame containing the features X with all missing values imputed by lnr. Similar to calling fit!(lnr, X; kwargs...) followed by transform(lnr, X).


fit_transform!(grid::GridSearch, X::FeatureInput; kwargs...)

As fit_transform! for an imputation learner, but performs validation over the grid parameters during training before returning the final imputed DataFrame.


fit_transform!(grid::GridSearch, train_X::FeatureInput,
               valid_X::FeatureInput; kwargs...)

As fit_transform! but performs validation with the pre-split training and validation sets train_X and valid_X.

Reward Estimation

Multi-task Learners

IAI.predictMethod
predict(lnr::SupervisedMultiLearner, X::FeatureInput)

Variant of predict for multi-task problems that returns the predictions for all tasks as a dictionary.

IAI.predictMethod
predict(lnr::SupervisedMultiLearner, X::FeatureInput, task_label::Symbol)

Variant of predict for multi-task problems that returns the predictions for the task given by task_label.

IAI.scoreMethod
score(lnr::SupervisedMultiLearner, X::FeatureInput, y::TargetInput...;
      keyword_arguments...)

Variant of score for multi-task problems that returns the average score across all tasks.

IAI.scoreMethod
score(lnr::SupervisedMultiLearner, X::FeatureInput, y::TargetInput...,
      task_label::Symbol; keyword_arguments...)

Variant of score for multi-task problems that returns the score for the task given by task_label.

IAI.predict_probaMethod
predict_proba(lnr::ClassificationMultiLearner, X::FeatureInput)

Variant of predict_proba for multi-task problems that returns the predictions for all tasks as a dictionary.

IAI.predict_probaMethod
predict_proba(lnr::ClassificationMultiLearner, X::FeatureInput,
              task_label::Symbol)

Variant of predict_proba for multi-task problems that returns the predictions for the task given by task_label.

IAI.ROCCurveMethod
ROCCurve(lnr::ClassificationMultiLearner, X::FeatureInput,
         y::MultiTargetInput)

Variant of ROCCurve for multi-task problems that returns the curves for all tasks as a dictionary.

IAI.ROCCurveMethod
ROCCurve(lnr::ClassificationMultiLearner, X::FeatureInput,
         y::MultiTargetInput, task_label::Symbol)

Variant of ROCCurve for multi-task problems that returns the curve for the task given by task_label.

Model-free Utilities

IAI.scoreMethod
score(task::Symbol, predictions, truths; keyword_arguments...)

Calculates the score attained by predictions against the true target truths for the problem type indicated by task. By default, all scores are calibrated such that higher is better (and 1 is the maximum possible score).

The type and number of arguments for predictions and truths depend on task and the value of criterion.

The permissible values for task are:

  • :classification for classification problems
  • :regression for regression problems
  • :survival for survival problems

Keyword Arguments

  • sample_weight::SampleWeightInput=nothing: the weighting to give to each data point.
  • criterion: the scoring criterion to use when evaluating the score (refer to the documentation on scoring criteria for more information).
  • scale::Bool=true: whether to scale the score so that higher is better and 1 is the maximum possible.

Additional keyword arguments are passed to the criterion as usual.


score(:classification, y_pred::AbstractVector, y_true::AbstractVector;
      criterion=:misclassification, keyword_arguments...)

Calculates the misclassification score of predicted labels y_pred against true labels y_true.

Examples

y_pred = ["A", "B", "B", "B"]
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:misclassification)
0.75

score(:classification, y_pred::AbstractDataFrame, y_true::AbstractVector;
      criterion=:gini, keyword_arguments...)

Calculates the gini impurity score of predicted probabilities y_pred against true labels y_true.

Also applies for calculating entropy with criterion=:entropy.

Examples

import DataFrames
y_pred = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:gini)
0.30000000000000004

score(:classification, y_pred::AbstractDataFrame, y_true::AbstractVector;
      criterion=:auc, positive_label, keyword_arguments...)
score(:classification, y_pred::AbstractVector, y_true::AbstractVector;
      criterion=:auc, keyword_arguments...)

Calculates the AUC of predicted probabilities y_pred against true labels y_true.

The predicted probabilities y_pred can either be:

  • an AbstractDataFrame of predicted probabilities for each label, in which case one of the labels in the data must be identified as the positive label using the positive_label keyword argument
  • an AbstractVector of predicted probabilities for the positive label

Also applies for calculating any threshold-based criteria.

Examples

import DataFrames
y_pred = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:auc, positive_label="B")
0.75
y_pred = [0.1, 0.8, 0.4, 0.3]
IAI.score(:classification, y_pred, y_true, criterion=:auc)
0.75

score(:regression, y_pred::AbstractVector{<:Real},
      y_true::AbstractVector{<:Real}; criterion=:mse, keyword_arguments...)

Calculates the mean-squared error of predicted values y_pred against true values y_true.

Also applies for calculating tweedie or hinge loss criteria.

Examples

import DataFrames
y_pred = [0.9, 0.2, 0.6, 0.7]
y_true = [1.0, 0.2, 0.5, 0.5]
IAI.score(:regression, y_pred, y_true, criterion=:mse)
0.8181818181818182

score(:survival, y_pred::AbstractVector{<:Real},
      deaths::AbstractVector{Bool}, times::AbstractVector;
      criterion=:localfulllikelihood, keyword_arguments...)

Calculates the local full likelihood of predicted hazards y_pred against the true data deaths and times.

Also applies for calculating Harrell's c-statistic.

Examples

y_pred = [2, 0.5, 1.2, 0.7]
deaths = [true, true, false, false]
times = [1, 10, 3, 7]
IAI.score(:survival, y_pred, deaths, times, criterion=:localfulllikelihood)
0.15105918059568402
IAI.ROCCurveMethod
ROCCurve(probs::AbstractVector{<:Real}, y::AbstractVector, positive_label)
ROCCurve(probs::AbstractDataFrame, y::AbstractVector, positive_label)

Construct a ROCCurve from predicted probabilities probs and true labels y. It is required to specify one of the labels contained in y as the positive_label so that probs gives the predicted probability of being equal to positive_label for each sample.

probs can be either:

  • a vector of probabilities indicating the probability of positive_label
  • a dataframe where the positive_label is one of the column names, giving the probability of positive_label (for example, the output of predict_proba)

Examples

Calculate AUC from a vector of predicted probabilities and true labels:

probs = [0.1, 0.8, 0.4, 0.3]
y = ["A", "B", "A", "B"]
IAI.ROCCurve(probs, y, positive_label="B").auc
0.75

Calculate AUC from a dataframe of predicted probabilities and true labels:

import DataFrames
probs = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y = ["A", "B", "A", "B"]
IAI.ROCCurve(probs, y, positive_label="B").auc
0.75