API Reference
Documentation for the IAIBase
public interface.
Index
IAI.AbstractVisualization
IAI.CategoricalRewardEstimationLearner
IAI.ClassificationLearner
IAI.ClassificationMultiLearner
IAI.FeatureInput
IAI.FeatureMapping
IAI.FeatureSet
IAI.GridSearch
IAI.Learner
IAI.MixedDatum
IAI.MultiLearner
IAI.MultiQuestionnaire
IAI.MultiQuestionnaire
IAI.MultiTargetInput
IAI.NumericMixedDatum
IAI.NumericRewardEstimationLearner
IAI.OrdinalMixedDatum
IAI.PolicyLearner
IAI.PrescriptionLearner
IAI.Questionnaire
IAI.ROCCurve
IAI.ROCCurve
IAI.ROCCurve
IAI.ROCCurve
IAI.ROCCurve
IAI.RegressionLearner
IAI.RegressionMultiLearner
IAI.RelativeParameterInput
IAI.RewardEstimationLearner
IAI.SampleWeightInput
IAI.SupervisedLearner
IAI.SupervisedMultiLearner
IAI.SurvivalCurve
IAI.SurvivalLearner
IAI.TargetInput
IAI.UnsupervisedLearner
IAI.clone
IAI.delete_rich_output_param!
IAI.fit!
IAI.fit!
IAI.fit_cv!
IAI.fit_transform!
IAI.fit_transform_cv!
IAI.get_best_params
IAI.get_features_used
IAI.get_grid_result_details
IAI.get_grid_result_summary
IAI.get_learner
IAI.get_params
IAI.get_rich_output_params
IAI.get_roc_curve_data
IAI.get_survival_curve_data
IAI.make_html_table
IAI.make_mixed_data
IAI.predict
IAI.predict
IAI.predict
IAI.predict
IAI.predict_expected_survival_time
IAI.predict_expected_survival_time
IAI.predict_hazard
IAI.predict_outcomes
IAI.predict_outcomes
IAI.predict_proba
IAI.predict_proba
IAI.predict_proba
IAI.predict_treatment_outcome
IAI.predict_treatment_outcome_standard_error
IAI.predict_treatment_rank
IAI.read_json
IAI.resume_from_checkpoint
IAI.score
IAI.score
IAI.score
IAI.score
IAI.set_params!
IAI.set_rich_output_param!
IAI.show_in_browser
IAI.show_in_browser
IAI.split_data
IAI.transform
IAI.undo_mixed_data
IAI.variable_importance
IAI.write_html
IAI.write_html
IAI.write_json
Data Preparation
IAI.FeatureInput
— TypePermissible types for specifying the feature data.
The features can be supplied as Matrix
of Real
s or a DataFrame
as follows:
- numeric features are specified using numeric vectors
- categoric and ordinal features are specified using
CategoricalVector
s - mixed features are specified using vectors of
MixedDatum
(seemake_mixed_data
) - missing values are specified using
missing
For more details, refer to the data preparation guide in the manual.
IAI.MixedDatum
— TypeMixedDatum{T}
Represents a mixed feature value that can either be categoric or of type T
.
The value has the following fields:
iscat::Bool
:true
if the value is categoricvalue_cat
: the value if categoricvalue_else
: the value if non-categoric
Mixed features are specified in the data using vectors of MixedDatum
. It is recommended to create and work with these vectors of MixedDatum
values via make_mixed_data
and undo_mixed_data
.
IAI.NumericMixedDatum
— TypeA MixedDatum
that holds either numeric or categoric values
IAI.OrdinalMixedDatum
— TypeA MixedDatum
that holds either ordinal or categoric values
IAI.make_mixed_data
— Functionmake_mixed_data(input)
Construct a vector of mixed categoric and numeric data from input
. All numeric values from input
are treated as numeric data, and all remaining values are treated as categoric data.
Examples
Construct a mixed data vector with a numeric score and two additional levels ("Unknown" and "NA")
IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])
6-element Vector{Union{Missing, NumericMixedDatum}}:
13.0
"Unknown"
"NA"
2.0
4.0
missing
make_mixed_data(input, ordinal_levels)
Construct a vector of mixed categoric and ordinal data from input
. All values from input
that are in ordinal_levels
are treated as ordinal data with the ordering indicated by the order of ordinal_levels
, and all remaining values are treated as categoric data.
Examples
Construct a mixed data vector with three ordered levels (A < B < C) and two additional levels ("Unknown" and "NA")
IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing], ["A", "B", "C"])
6-element Vector{Union{Missing, OrdinalMixedDatum{String}}}:
"B"
"Unknown"
"NA"
"C"
"A"
missing
IAI.undo_mixed_data
— Functionundo_mixed_data(mixed_data)
Convert an vector of mixed data back to a normal Vector
with mixed types.
Examples
Undo the conversion to numeric mixed data vector
numeric_mixed = IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])
IAI.undo_mixed_data(numeric_mixed)
6-element Vector{Any}:
13.0
"Unknown"
"NA"
2.0
4.0
missing
Undo the conversion to ordinal mixed data vector
ordinal_mixed = IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing],
["A", "B", "C"])
IAI.undo_mixed_data(ordinal_mixed)
6-element Vector{Union{Missing, String}}:
"B"
"Unknown"
"NA"
"C"
"A"
missing
IAI.TargetInput
— TypePermissible types for specifying the problem target. The number and types of the target arguments depend on the problem type (for more information, refer to the data preparation guide in the manual):
Classification
y
:AbstractVector
of class labels
Regression
y
:AbstractVector
of numeric values
Prescription
treatments
:AbstractVector
of treatment labelsoutcomes
:AbstractVector
of numeric outcomes
Survival
deaths
:AbstractVector{Bool}
indicating which observations are deathstimes
:AbstractVector
of times for each observation
Imputation
No target required
IAI.MultiTargetInput
— TypePermissible types for specifying the problem target for multi-task problems.
Multi-task targets are supplied either as an AbstractMatrix
or AbstractDataFrame
, where the columns contain the targets for each task.
IAI.SampleWeightInput
— TypePermissible types for specifying sample weights:
nothing
(default) will assign equal weight to all pointsVector
orStatsBase.Weights
of the weights for each point
Additionally for problems with discrete outcomes (classification/prescription):
Dict
giving the weight for each label:autobalance
to use weights that give each label equal weight
For more information, refer to the data preparation guide in the manual.
IAI.RelativeParameterInput
— TypePermissible types for specifying parameters relative to the number of samples or features:
:all
: allowed to use all- a non-negative
Integer
: the value to be used - a
Real
between 0 and 1: use this proportion of the total number :sqrt
: use the square root of the total number:log2
: use the base-2 logarithm of the total number
IAI.split_data
— Functionsplit_data(task::Symbol, X::FeatureInput, y::TargetInput...;
keyword_arguments...)
Split the data (X
and y
) into a tuple of training and testing data: (X_train, y_train...), (X_test, y_test...)
.
The mechanism used to split the data is determined by task
:
- Stratified:
:classification
gives a stratified split on the class labels:prescription
or:policy_categorical
gives a stratified split on the treatments
- Non-stratified:
:regression
,:survival
,:policy_numeric
gives randomly split data
Keyword Arguments
train_proportion=0.7
: proportion of data in training setshuffle=true
: whether the data is shuffled before splitting. Disabling shuffling can be useful if the training/testing split should be based on the order of the data (e.g. if the data is sorted by time). Ifshuffle=false
, the split will not be stratified even if the task defaults to a stratified approachseed=nothing
: random seed for splitting, uses the global random state ifnothing
is specified
Examples
Classification:
X = [1 2; 3 4; 5 6; 7 8]
y = ["A", "B", "A", "B"]
(train_X, train_y), (test_X, test_y) =
IAI.split_data(:classification, X, y, seed=1)
Regression:
X = [1 2; 3 4; 5 6; 7 8]
y = [0.1, 0.2, 0.3, 0.4]
(train_X, train_y), (test_X, test_y) =
IAI.split_data(:regression, X, y, seed=1)
Survival:
X = [1 2; 3 4; 5 6; 7 8]
deaths = [true, false, true, false]
times = [1, 2, 3, 4]
(train_X, train_deaths, train_times), (test_X, test_deaths, test_times) =
IAI.split_data(:survival, X, deaths, times, seed=1)
Prescription:
X = [1 2; 3 4; 5 6; 7 8]
treatments = ["A", "B", "A", "B"]
outcomes = [0.1, 0.2, 0.3, 0.4]
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
IAI.split_data(:prescription, X, treatments, outcomes, seed=1)
split_data(task::Symbol, inds::AbstractRange, y::TargetInput...;
keyword_arguments...)
Split a range of indices (inds
) into sets of indices for training and testing.
The mechanism used to split the data is determined by task
and y
as in split_data
, and the same keyword arguments are supported.
Examples
y = ["A", "B", "A", "B"]
train_inds, test_inds = IAI.split_data(:classification, 1:length(y), y, seed=1)
([1, 2], [3, 4])
split_data(task::Symbol, n::Integer, y::TargetInput...;
keyword_arguments...)
Construct sets of indices for training and testing for data with n
samples.
Same as split_data
with the range 1:n
.
Examples
y = ["A", "B", "A", "B"]
train_inds, test_inds = IAI.split_data(:classification, length(y), y, seed=1)
([1, 2], [3, 4])
Learners
IAI.Learner
— TypeAbstract type encompassing all learners.
Learners are further divided into groups:
SupervisedLearner
for supervised tasks, containing:UnsupervisedLearner
for unsupervised tasks, containing:
IAI.SupervisedLearner
— TypeAbstract type encompassing all learners for supervised tasks
IAI.UnsupervisedLearner
— TypeAbstract type encompassing all learners for unsupervised tasks
IAI.FeatureSet
— TypePermissible types for specifying set of features in a dataframe. Refer to Indexing in DataFrames.jl for a full list of supported rules:
Input Type | Description | Examples |
---|---|---|
All | Use all columns | All() |
Integer or a vector of Integer s | Specify indices of columns to use | 1 , [1, 3, 4] |
Symbol or a vector of Symbol s | Specify names of columns to use | :x1 , [:x1, :x3] |
String or a vector of String s | Specify names of columns to use | "x1" , ["x1", "x3"] |
Not | Specify columns not to use | Not(1) , Not(["x2", "x4"]) |
Between | Specify range of columns to use | Between("x1", "x4") |
IAI.FeatureMapping
— TypePermissible types for specifying a mapping from features to values. One or more mappings from FeatureSet
to value can be supplied, with each mapping in one of the following formats:
- a
Pair
from key to value, e.g."A" => 1
- a
Tuple
containing key and value, e.g.("A", 1)
- a
Vector
containing key and value, e.g.["A", 1]
If there is only one such mapping, it can be supplied directly. If more than one mapping is present, they can be supplied in one of the following containers:
- a
Dict
with entries for each mapping, e.g.Dict("A" => 1, "B" => 2)
- a
NamedTuple
with entries for each mapping, e.g.(A=1, B=2)
- a
Tuple
containing each mapping, e.g.("A" => 1, "B" => 2)
- a
Vector
containing each mapping, e.g.["A" => 1, "B" => 2]
Fitting
IAI.fit!
— Methodfit!(lnr::Learner, X::FeatureInput, y::TargetInput...;
sample_weight::SampleWeightInput=nothing)
Fits a model using the parameters in lnr
and the data X
and y
.
Evaluation (for supervised learners only)
IAI.predict
— Methodpredict(lnr::SupervisedLearner, X::FeatureInput)
Return the predictions made by the trained model in lnr
for each point in the data X
.
IAI.score
— Methodscore(lnr::SupervisedLearner, X::FeatureInput, y::TargetInput...;
keyword_arguments...)
Calculates the score for lnr
on data X
and y
. By default, all scores are calibrated such that higher is better (and 1 is the maximum possible score).
Keyword Arguments
sample_weight::SampleWeightInput=nothing
: the weighting to give to each data point.criterion=:default
: the scoring criterion to use when evaluating the score (refer to the documentation on scoring criteria for more information). Uses the criterion inlnr
if left as:default
.scale::Bool=true
: whether to scale the score so that higher is better and 1 is the maximum possible.- extra keyword arguments are passed through to configure the specified scoring criterion (e.g.
tweedie_variance_power
for:tweedie
)
Utilities
IAI.write_json
— Functionwrite_json(f, obj; keyword_arguments...)
Write obj
(can be a Learner
or GridSearch
) to f
in JSON format.
Keyword Arguments
indent=2
: level of indentation in JSON, or set tonothing
to disable.
IAI.read_json
— Functionread_json(f)
Read in a Learner
or GridSearch
saved in JSON format from f
, which can be either a filepath from which to read the saved JSON, or a dictionary containing already-parsed JSON data.
IAI.resume_from_checkpoint
— Functionresume_from_checkpoint(checkpoint_file)
Resume training from the supplied checkpoint_file
.
Refer to the documentation on training checkpoints for more information.
IAI.variable_importance
— Methodvariable_importance(lnr::Learner; keyword_arguments...)
Generate a ranking of the variables in lnr
according to their importance during training. The results are normalized so that they sum to one.
For linear models (e.g. linear/logistic regression, hyperplane splits) the importance is determined using the coefficients in the resulting model after scaling to account for features of different magnitudes.
IAI.get_features_used
— Functionget_features_used(lnr::Learner)
Return a Vector
of Symbol
s for feature names used by the lnr
.
IAI.get_params
— Functionget_params!(lnr::Learner)
Return a Dict
containing the values of user-specified parameters in lnr
.
IAI.set_params!
— Functionset_params!(lnr::Learner; params...)
Update user-specified parameters in lnr
with all supplied key-value pairs in params
.
IAI.clone
— Functionclone(lnr::Learner)
clone(grid::GridSearch)
Return an unfitted copy of lnr
or grid
with the same user-specified parameters.
Visualization in Browser
IAI.AbstractVisualization
— TypeAbstract type encompassing objects related to visualization. Examples include:
IAI.write_html
— Methodwrite_html(f, vis::AbstractVisualization; keyword_arguments...)
Generic function for saving a visualization vis
to f
in HTML format.
IAI.show_in_browser
— Methodshow_in_browser(vis::AbstractVisualization; keyword_arguments...)
Generic function for showing a visualization vis
in the browser.
IAI.Questionnaire
— MethodQuestionnaire(lnr::Learner; keyword_arguments...)
Abstract type encompassing objects related to interactive questionnaires. Examples include:
- Tree
Questionnaire
- Optimal Feature Selection
Questionnaire
IAI.MultiQuestionnaire
— MethodMultiQuestionnaire(questions::Pair; keyword_arguments...)
Specifies an interactive questionnaire using multiple learners as specified by questions
. Refer to the documentation on multi-learner visualizations for more details.
IAI.MultiQuestionnaire
— MethodMultiQuestionnaire(grid::GridSearch; keyword_arguments...)
Constructs an interactive questionnaire containing the final fitted learner as well as the learner found for each parameter combination.
IAI.set_rich_output_param!
— Functionset_rich_output_param!(key::Symbol, value)
Sets the global rich output parameter key
to value
.
For examples of parameters for rich outputs, see write_png
or write_html
IAI.get_rich_output_params
— Functionget_rich_output_params()
Return the current global rich output parameter settings.
IAI.delete_rich_output_param!
— Functiondelete_rich_output_param!(key::Symbol)
Delete the global rich output parameter key
.
IAI.make_html_table
— Methodmake_html_table(df::DataFrame)
Return a string representing df
as an HTML table.
Grid Search and Parameter Validation
IAI.GridSearch
— TypeGridSearch(lnr::Learner, param_grid)
Controls grid search over parameter combinations in param_grid
to find the best combination of parameters for lnr
.
lnr
is a learner with any parameters that should be included in all combinations of parameters tested.
param_grid
contains the parameter ranges to search over. These can be supplied in multiple ways, which we demonstrate with examples that create identical GridSearch
s to tune lnr
over the parameters criterion
and normalize_X
:
one or more keyword arguments to the
GridSearch
constructor containingkey=value
pairs for all desired parameters and their ranges:IAI.GridSearch(lnr, criterion=[:gini, :entropy], normalize_X=[true, false])
a
Dict
orNamedTuple
where the keys are the names of the parameters to tune, and the corresponding values are the range over which to vary each parameter:IAI.GridSearch(lnr, Dict(:criterion => [:gini, :entropy], :normalize_X => [true, false]))
IAI.GridSearch(lnr, (criterion=[:gini, :entropy], normalize_X=[true, false]))
a
Vector{Dict}
orVector{NamedTuple}
where each entry specifies a grid of parameters to test (refer to the documentation on multiple parameter grids):IAI.GridSearch(lnr, [ (criterion=:gini, normalize_X=[true, false]), (criterion=:entropy, normalize_X=[true, false]), ])
Keyword Arguments
train_proportion
: if specified,fit!
will use a single train-validation split with this proportion of the data as training. Defaults tonothing
.n_folds
: if specified,fit!
will use cross-validation with this number of folds, rather than a single split. Defaults tonothing
.validation_criterion
: if specified, will be used as the validation criterion during grid search whenvalidation_criterion
is not passed explicitly as a keyword argument. Defaults to:default
, meaning thecriterion
from the learner in the grid search will be used.fit_kwargs
: additional keyword arguments to pass by default tofit!
/fit_cv
, such as keyword arguments used to configurevalidation_criterion
(e.g.positive_label
for certain classification criteria)
IAI.fit!
— Methodfit!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
keyword_arguments...)
Fit a grid with data X
and y...
by randomly splitting the data into training and validation sets in the same way as split_data
.
Keyword Arguments
train_proportion::Real
: a number between 0 and 1 indicating the proportion of data to use in training. If not specified, will default first to any value specified when creatinggrid
, and then to0.7
. If no value is provided fortrain_proportion
butn_folds
was specified when creatinggrid
, then cross-validation withn_folds
will be used (seefit_cv!
).sample_weight::SampleWeightInput=nothing
: the weighting to give to each data point.validation_criterion::Symbol=:default
: the scoring criterion that should be used to evaluate the parameter combinations to determine which is best (refer to the documentation on scoring criteria for more information). If left as:default
, will default first to any value specified when creatinggrid
, and then to the value ofcriterion
in the learner used bygrid
.run_gc::Bool=false
: iftrue
, runs the Julia garbage collector between each parameter combination to reclaim unused any memory immediately. This is usually unnecessary as Julia will automatically run garbage collection itself as needed.verbose::Bool=false
: iftrue
, prints out the score for each parameter combination during the grid search. Can only be set totrue
if the learner in the grid search hasshow_progress
set tofalse
.- extra keyword arguments are passed through to configure the specified scoring criterion (e.g.
tweedie_variance_power
for:tweedie
)
fit!(grid::GridSearch, train_X::FeatureInput, train_y::TargetInput...,
valid_X::FeatureInput, valid_y::TargetInput...; keyword_arguments...)
Fit a grid with explicit training and validation sets.
Supports the same keyword arguments as above with the exception of train_proportion
as the data has already been split. sample_weight
additionally accepts a Tuple
of sample weight vectors if you would like to specify explicit weight vectors for the training and validation sets.
IAI.fit_cv!
— Methodfit_cv!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
keyword_arguments...)
Fit a grid with data X
and y...
using k-fold cross-validation.
The keyword arguments are the same as for fitting the grid with randomly split data using IAI.fit!
, except the train_proportion
argument is replaced by n_folds
, which indicates the number of folds to use in the cross-validation (defaulting first to any value of n_folds
specified when creating grid
, and then to 5).
IAI.get_learner
— Functionget_learner(grid::GridSearch)
Return the final fitted learner using the best parameter combination from the grid.
IAI.get_best_params
— Functionget_best_params(grid::GridSearch)
Return the best parameter combination from the grid.
Examples
Example output from a GridSearch
used to tune an OptimalTreeClassifier
:
IAI.get_best_params(grid)
Dict{Symbol, Any} with 2 entries:
:cp => 0.0357143
:max_depth => 3
IAI.get_grid_result_summary
— Functionget_grid_result_summary(grid::GridSearch)
Return a DataFrame
summarizing the results from the grid search.
Each row corresponds to a single parameter combination from the grid search, and contains:
- the value of each parameter
- the training and validation scores of the learner trained using these parameters
- the rank of this parameter combination according to the validation score (where a rank of 1 indicates the best parameter combination)
When fitting the grid using cross-validation, the training and validation scores for each fold are shown, along with the mean and standard deviation of these scores.
Examples
Example output from a GridSearch
used to tune an OptimalTreeClassifier
:
IAI.get_grid_result_summary(grid)
3×5 DataFrame
Row │ max_depth cp train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 1 0.25 0.666667 0.666667 3
2 │ 2 0.228571 0.971429 0.911111 2
3 │ 3 0.0357143 0.980952 0.915556 1
IAI.get_grid_result_details
— Functionget_grid_result_details(grid::GridSearch)
Return a Vector
of Dict
s corresponding to each combination of parameters tested in the grid search, where each Dict
contains the following entries:
:params
: aDict
of parameter values for this combination:valid_score
: the validation score for this parameter combination (for cross-validation, this is the mean validation score across all folds):rank
: the rank of this parameter combination based on the validation score, where lower is better:fold_results
: aVector
containing aDict
for each fold in the grid search (for grids without cross-validation, there is a single fold corresponding to the training/validation sets). EachDict
contains the following entries::train_score
: the training score for the fold:valid_score
: the validation score for the fold:learner
: the trained learner in the fold
Task-specific Functions
These functions are only available to learners of the appropriate type for the problem.
Classification
IAI.ClassificationLearner
— TypeAbstract type encompassing all learners for classification tasks
IAI.predict_proba
— Methodpredict_proba(lnr::ClassificationLearner, X::FeatureInput)
Return the probabilities of class membership predicted by the trained model in lnr
for each point in the features X
.
IAI.ROCCurve
— TypeContainer for ROC curve information.
The data underlying the curve can be extracted with get_roc_curve_data
.
The resulting curve can be visualized in the browser using show_in_browser
, or with write_html
to save the visualization in HTML format. You can also use plot
from Plots.jl to visualize the curve.
IAI.get_roc_curve_data
— Functionget_roc_curve_data(curve::ROCCurve)
Extract the underlying data from curve
as a Dict
with two keys:
:coords
: aVector
ofDict
s representing the points on the curve. EachDict
contains the following keys::fpr
: false positive rate at the given threshold:tpr
: true positive rate at the given threshold:threshold
: the threshold
:auc
: the area-under-the-curve (AUC)
IAI.ROCCurve
— MethodROCCurve(lnr::ClassificationLearner, X::FeatureInput, y::AbstractVector;
positive_label)
Construct a ROCCurve
using trained lnr
on the features X
and labels y
, treating positive_label
as the positive label.
Can only be applied to classification problems with $K=2$ classes.
IAI.write_html
— Methodwrite_html(f, roc::ROCCurve)
Write interactive browser visualization of roc
to f
in HTML format.
IAI.show_in_browser
— Methodshow_in_browser(roc::ROCCurve)
Display an interactive visualization of roc
in the browser.
Regression
IAI.RegressionLearner
— TypeAbstract type encompassing all learners for regression tasks
Survival
IAI.SurvivalLearner
— TypeAbstract type encompassing all learners for survival tasks
IAI.predict_hazard
— Methodpredict_hazard(lnr::SurvivalLearner, X::FeatureInput)
Return the fitted hazard coefficient estimate made by the trained model in lnr
for each point in the data X
. A higher hazard coefficient estimate corresponds to a smaller predicted survival time.
IAI.predict_expected_survival_time
— Methodpredict_expected_survival_time(lnr::SurvivalLearner, X::FeatureInput)
Return the expected survival time according to the trained model in lnr
for each point in the data X
.
IAI.predict
— Methodpredict(lnr::SurvivalLearner, X::FeatureInput)
Return the SurvivalCurve
predicted by the trained model in lnr
for each point in the data X
.
predict(lnr::SurvivalLearner, X::FeatureInput; t::Number)
Return the probability that death occurs at or before time t
, as predicted by the trained model in lnr
for each point in the data X
.
IAI.SurvivalCurve
— TypeContainer for survival curve information.
Use curve[t]
to get the mortality probability prediction from curve
at time t
. This returns the cumulative distribution function evaluated at time t
, i.e., the probability that death occurs at or before time t
.
The data underlying the curve can be extracted with get_survival_curve_data
.
IAI.get_survival_curve_data
— Functionget_survival_curve_data(curve::SurvivalCurve)
Extract the underlying data from curve
as a Dict
with three keys:
:times
: the time for each breakpoint on the curve:coefs
: the mortality probablility for each breakpoint on the curve:expected_time
: the expected survival time:linear
: whether the curve coefficients come from a smoothed linear interpolation (seesmooth_survival_curves
for more information)
IAI.predict_expected_survival_time
— Methodpredict_expected_survival_time(curve::SurvivalCurve)
Return the expected survival time according to curve
.
Prescription
IAI.PrescriptionLearner
— TypeAbstract type encompassing all learners for prescription tasks
IAI.predict_outcomes
— Methodpredict_outcomes(lnr::PrescriptionLearner, X::FeatureInput)
Return a DataFrame
containing the predicted outcome for each treatment option made by the trained model in lnr
for each point in the features X
.
Policy
IAI.PolicyLearner
— TypeAbstract type encompassing all learners for policy tasks
IAI.predict_outcomes
— Methodpredict_outcomes(lnr::PolicyLearner, X::FeatureInput, rewards::FeatureInput)
Return the outcome from rewards
for each point in the features X
under the prescriptions made by the trained model in lnr
.
IAI.predict_treatment_rank
— Functionpredict_treatment_rank(lnr::PolicyLearner, X::FeatureInput)
Return a Matrix
containing the treatments in ranked order of effectiveness for each point in the features X
as predicted the trained model in lnr
. For example, the first column contains the best treatment for each point, the second column contains the second-best treatment, and so on.
IAI.predict_treatment_outcome
— Functionpredict_treatment_outcome(lnr::PolicyLearner, X::FeatureInput)
Return a DataFrame
containing the estimated quality of each treatment in the trained model from lnr
for each point in the features X
. These quality estimates are the values used by the model to determine the treatment ranks in predict_treatment_rank
and are based on aggregate statistics. For an individualized prediction of outcomes under the model prescription policy, use predict_outcomes
instead.
IAI.predict_treatment_outcome_standard_error
— Functionpredict_treatment_outcome_standard_error(lnr::PolicyLearner,
X::FeatureInput)
Return a DataFrame
containing the standard error for the estimated quality of each treatment in the trained model from lnr
for each point in the features X
. These errors can be used to form confidence intervals around results from predict_treatment_outcome
Imputation
IAI.transform
— Functiontransform(lnr::ImputationLearner, X::FeatureInput)
Return a DataFrame
containing the features X
with all missing values imputed by the fitted imputation model in lnr
.
IAI.fit_transform!
— Functionfit_transform!(lnr::ImputationLearner, X::FeatureInput; kwargs...)
Fit lnr
with an imputation model on features X
and return a DataFrame
containing the features X
with all missing values imputed by lnr
. Similar to calling fit!(lnr, X; kwargs...)
followed by transform(lnr, X)
.
fit_transform!(grid::GridSearch, X::FeatureInput; kwargs...)
As fit_transform!
for an imputation learner, but performs validation over the grid parameters during training before returning the final imputed DataFrame
.
fit_transform!(grid::GridSearch, train_X::FeatureInput,
valid_X::FeatureInput; kwargs...)
As fit_transform!
but performs validation with the pre-split training and validation sets train_X
and valid_X
.
IAI.fit_transform_cv!
— Functionfit_transform_cv!(grid::GridSearch, X::FeatureInput; kwargs...)
As fit_transform!
for a grid search, but uses k-fold cross validation to determine the best parameter combination. Similar to calling fit_cv!(lnr, X; kwargs...)
followed by transform(lnr, X)
.
Reward Estimation
IAI.RewardEstimationLearner
— TypeAbstract type encompassing all learners for reward estimation.
IAI.CategoricalRewardEstimationLearner
— TypeAbstract type encompassing all learners for reward estimation with categorical treatments.
IAI.NumericRewardEstimationLearner
— TypeAbstract type encompassing all learners for reward estimation with numeric treatments.
Multi-task Learners
IAI.MultiLearner
— TypeAbstract type encompassing all multi-task learners.
Learners are further divided into groups:
SupervisedLearner
for supervised tasks, containing:
IAI.SupervisedMultiLearner
— TypeAbstract type encompassing all multi-task learners for supervised tasks
IAI.predict
— Methodpredict(lnr::SupervisedMultiLearner, X::FeatureInput)
Variant of predict
for multi-task problems that returns the predictions for all tasks as a dictionary.
IAI.predict
— Methodpredict(lnr::SupervisedMultiLearner, X::FeatureInput, task_label::Symbol)
Variant of predict
for multi-task problems that returns the predictions for the task given by task_label
.
IAI.score
— Methodscore(lnr::SupervisedMultiLearner, X::FeatureInput, y::TargetInput...;
keyword_arguments...)
Variant of score
for multi-task problems that returns the average score across all tasks.
IAI.score
— Methodscore(lnr::SupervisedMultiLearner, X::FeatureInput, y::TargetInput...,
task_label::Symbol; keyword_arguments...)
Variant of score
for multi-task problems that returns the score for the task given by task_label
.
IAI.ClassificationMultiLearner
— TypeAbstract type encompassing all multi-task learners for classification tasks
IAI.predict_proba
— Methodpredict_proba(lnr::ClassificationMultiLearner, X::FeatureInput)
Variant of predict_proba
for multi-task problems that returns the predictions for all tasks as a dictionary.
IAI.predict_proba
— Methodpredict_proba(lnr::ClassificationMultiLearner, X::FeatureInput,
task_label::Symbol)
Variant of predict_proba
for multi-task problems that returns the predictions for the task given by task_label
.
IAI.ROCCurve
— MethodROCCurve(lnr::ClassificationMultiLearner, X::FeatureInput,
y::MultiTargetInput)
Variant of ROCCurve
for multi-task problems that returns the curves for all tasks as a dictionary.
IAI.ROCCurve
— MethodROCCurve(lnr::ClassificationMultiLearner, X::FeatureInput,
y::MultiTargetInput, task_label::Symbol)
Variant of ROCCurve
for multi-task problems that returns the curve for the task given by task_label
.
IAI.RegressionMultiLearner
— TypeAbstract type encompassing all learners for regression tasks
Model-free Utilities
IAI.score
— Methodscore(task::Symbol, predictions, truths; keyword_arguments...)
Calculates the score attained by predictions
against the true target truths
for the problem type indicated by task
. By default, all scores are calibrated such that higher is better (and 1 is the maximum possible score).
The type and number of arguments for predictions
and truths
depend on task
and the value of criterion
.
The permissible values for task
are:
:classification
for classification problems:regression
for regression problems:survival
for survival problems
Keyword Arguments
sample_weight::SampleWeightInput=nothing
: the weighting to give to each data point.criterion
: the scoring criterion to use when evaluating the score (refer to the documentation on scoring criteria for more information).scale::Bool=true
: whether to scale the score so that higher is better and 1 is the maximum possible.
Additional keyword arguments are passed to the criterion as usual.
score(:classification, y_pred::AbstractVector, y_true::AbstractVector;
criterion=:misclassification, keyword_arguments...)
Calculates the misclassification score of predicted labels y_pred
against true labels y_true
.
Examples
y_pred = ["A", "B", "B", "B"]
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:misclassification)
0.75
score(:classification, y_pred::AbstractDataFrame, y_true::AbstractVector;
criterion=:gini, keyword_arguments...)
Calculates the gini impurity score of predicted probabilities y_pred
against true labels y_true
.
Also applies for calculating entropy with criterion=:entropy
.
Examples
import DataFrames
y_pred = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:gini)
0.30000000000000004
score(:classification, y_pred::AbstractDataFrame, y_true::AbstractVector;
criterion=:auc, positive_label, keyword_arguments...)
score(:classification, y_pred::AbstractVector, y_true::AbstractVector;
criterion=:auc, keyword_arguments...)
Calculates the AUC of predicted probabilities y_pred
against true labels y_true
.
The predicted probabilities y_pred
can either be:
- an
AbstractDataFrame
of predicted probabilities for each label, in which case one of the labels in the data must be identified as the positive label using thepositive_label
keyword argument - an
AbstractVector
of predicted probabilities for the positive label
Also applies for calculating any threshold-based criteria.
Examples
import DataFrames
y_pred = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y_true = ["A", "B", "A", "B"]
IAI.score(:classification, y_pred, y_true, criterion=:auc, positive_label="B")
0.75
y_pred = [0.1, 0.8, 0.4, 0.3]
IAI.score(:classification, y_pred, y_true, criterion=:auc)
0.75
score(:regression, y_pred::AbstractVector{<:Real},
y_true::AbstractVector{<:Real}; criterion=:mse, keyword_arguments...)
Calculates the mean-squared error of predicted values y_pred
against true values y_true
.
Also applies for calculating tweedie or hinge loss criteria.
Examples
import DataFrames
y_pred = [0.9, 0.2, 0.6, 0.7]
y_true = [1.0, 0.2, 0.5, 0.5]
IAI.score(:regression, y_pred, y_true, criterion=:mse)
0.8181818181818182
score(:survival, y_pred::AbstractVector{<:Real},
deaths::AbstractVector{Bool}, times::AbstractVector;
criterion=:localfulllikelihood, keyword_arguments...)
Calculates the local full likelihood of predicted hazards y_pred
against the true data deaths
and times
.
Also applies for calculating Harrell's c-statistic.
Examples
y_pred = [2, 0.5, 1.2, 0.7]
deaths = [true, true, false, false]
times = [1, 10, 3, 7]
IAI.score(:survival, y_pred, deaths, times, criterion=:localfulllikelihood)
0.15105918059568402
IAI.ROCCurve
— MethodROCCurve(probs::AbstractVector{<:Real}, y::AbstractVector, positive_label)
ROCCurve(probs::AbstractDataFrame, y::AbstractVector, positive_label)
Construct a ROCCurve
from predicted probabilities probs
and true labels y
. It is required to specify one of the labels contained in y
as the positive_label
so that probs
gives the predicted probability of being equal to positive_label
for each sample.
probs
can be either:
- a vector of probabilities indicating the probability of
positive_label
- a dataframe where the
positive_label
is one of the column names, giving the probability ofpositive_label
(for example, the output ofpredict_proba
)
Examples
Calculate AUC from a vector of predicted probabilities and true labels:
probs = [0.1, 0.8, 0.4, 0.3]
y = ["A", "B", "A", "B"]
IAI.ROCCurve(probs, y, positive_label="B").auc
0.75
Calculate AUC from a dataframe of predicted probabilities and true labels:
import DataFrames
probs = DataFrames.DataFrame(A=[0.9, 0.2, 0.6, 0.7], B=[0.1, 0.8, 0.4, 0.3])
y = ["A", "B", "A", "B"]
IAI.ROCCurve(probs, y, positive_label="B").auc
0.75