API Reference
Documentation for the IAIBase public interface.
Index
IAI.AbstractVisualizationIAI.FeatureInputIAI.GridSearchIAI.LearnerIAI.MixedDatumIAI.NumericMixedDatumIAI.OrdinalMixedDatumIAI.ROCCurveIAI.ROCCurveIAI.ROCCurveIAI.RelativeParameterInputIAI.SampleWeightInputIAI.SurvivalCurveIAI.TargetInputIAI.cloneIAI.delete_rich_output_param!IAI.fit!IAI.fit!IAI.fit_cv!IAI.fit_predict!IAI.fit_transform!IAI.fit_transform_cv!IAI.get_best_paramsIAI.get_grid_resultsIAI.get_learnerIAI.get_paramsIAI.get_rich_output_paramsIAI.get_survival_curve_dataIAI.make_mixed_dataIAI.predictIAI.predictIAI.predict_expected_survival_timeIAI.predict_hazardIAI.predict_outcomesIAI.predict_outcomesIAI.predict_probaIAI.read_jsonIAI.scoreIAI.set_params!IAI.set_rich_output_param!IAI.show_in_browserIAI.show_in_browserIAI.split_dataIAI.transformIAI.undo_mixed_dataIAI.variable_importanceIAI.write_htmlIAI.write_htmlIAI.write_json
Data Preparation
IAI.FeatureInput — TypePermissible types for specifying the feature data.
The features can be supplied as Matrix of Reals or a DataFrame as follows:
- numeric features are specified using numeric vectors
- categoric and ordinal features are specified using
CategoricalVectors - mixed features are specified using vectors of
MixedDatum(seemake_mixed_data) - missing values are specified using
missing
For more details, refer to the data preparation guide in the manual.
IAI.MixedDatum — TypeMixedDatum{T}Represents a mixed feature value that can either be categoric or of type T.
The value has the following fields:
iscat::Bool:trueif the value is categoricvalue_cat: the value if categoricvalue_else: the value if non-categoric
Mixed features are specified in the data using vectors of MixedDatum. It is recommended to create and work with these vectors of MixedDatum values via make_mixed_data and undo_mixed_data.
IAI.NumericMixedDatum — TypeA MixedDatum that holds either numeric or categoric values
IAI.OrdinalMixedDatum — TypeA MixedDatum that holds either ordinal or categoric values
IAI.make_mixed_data — Functionmake_mixed_data(input)Construct a vector of mixed categoric and numeric data from input. All numeric values from input are treated as numeric data, and all remaining values are treated as categoric data.
Examples
Construct a mixed data vector with a numeric score and two additional levels ("Unknown" and "NA")
IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])6-element Array{MixedDatum{Float64},1}:
13.0
"Unknown"
"NA"
2.0
4.0
missingmake_mixed_data(input, ordinal_levels)Construct a vector of mixed categoric and ordinal data from input. All values from input that are in ordinal_levels are treated as ordinal data with the ordering indicated by the order of ordinal_levels, and all remaining values are treated as categoric data.
Examples
Construct a mixed data vector with three ordered levels (A < B < C) and two additional levels ("Unknown" and "NA")
IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing], ["A", "B", "C"])6-element Array{MixedDatum{CategoricalArrays.CategoricalValue{Any,UInt32}},1}:
"B"
"Unknown"
"NA"
"C"
"A"
missingIAI.undo_mixed_data — Functionundo_mixed_data(mixed_data)Convert an vector of mixed data back to a normal Vector with mixed types.
Examples
Undo the conversion to numeric mixed data vector
numeric_mixed = IAI.make_mixed_data([13, "Unknown", "NA", 2, 4, missing])
IAI.undo_mixed_data(numeric_mixed)6-element Array{Any,1}:
13.0
"Unknown"
"NA"
2.0
4.0
missingUndo the conversion to ordinal mixed data vector
ordinal_mixed = IAI.make_mixed_data(["B", "Unknown", "NA", "C", "A", missing],
["A", "B", "C"])
IAI.undo_mixed_data(ordinal_mixed)6-element Array{Union{Missing, String},1}:
"B"
"Unknown"
"NA"
"C"
"A"
missingIAI.TargetInput — TypePermissible types for specifying the problem target. The number and types of the target arguments depend on the problem type (for more information, refer to the data preparation guide in the manual):
Classification
y:AbstractVectorof class labels
Regression
y:AbstractVectorof numeric values
Prescription
treatments:AbstractVectorof treatment labelsoutcomes:AbstractVectorof numeric outcomes
Survival
deaths:AbstractVector{Bool}indicating which observations are deathstimes:AbstractVectorof times for each observation
Imputation
No target required
IAI.SampleWeightInput — TypePermissible types for specifying sample weights:
nothing(default) will assign equal weight to all pointsVectororStatsBase.Weightsof the weights for each point
Additionally for problems with discrete outcomes (classification/prescription):
Dictgiving the weight for each label:autobalanceto use weights that give each label equal weight
For more information, refer to the data preparation guide in the manual.
IAI.RelativeParameterInput — TypePermissible types for specifying parameters relative to the number of samples or features:
:all: allowed to use all- a non-negative
Integer: the value to be used - a
Realbetween 0 and 1: use this proportion of the total number :sqrt: use the square root of the total number:log2: use the base-2 logarithm of the total number
IAI.split_data — Functionsplit_data(task::Symbol, X::FeatureInput, y::TargetInput...;
keyword_arguments...)Split the data (X and y) into a tuple of training and testing data: (X_train, y_train...), (X_test, y_test...).
The mechanism used to split the data is determined by task:
- Stratified:
:classificationgives a stratified split on the class labels:prescription_minimizeor:prescription_maximizegives a stratified split on the treatments
- Non-stratified:
:regressionand:survivalgives randomly split data
Keyword Arguments
train_proportion=0.7: proportion of data in training setshuffle=true: whether the returned data is shuffled. Ifshuffle=false, the split mechanism will not be stratified even if the task defaults to a stratified approachseed=nothing: random seed for splitting, uses the global random state ifnothingis specified
Examples
Classification:
X = [1 2; 3 4; 5 6; 7 8]
y = ["A", "B", "A", "B"]
(train_X, train_y), (test_X, test_y) =
IAI.split_data(:classification, X, y, seed=1)Regression:
X = [1 2; 3 4; 5 6; 7 8]
y = [0.1, 0.2, 0.3, 0.4]
(train_X, train_y), (test_X, test_y) =
IAI.split_data(:regression, X, y, seed=1)Survival:
X = [1 2; 3 4; 5 6; 7 8]
deaths = [true, false, true, false]
times = [1, 2, 3, 4]
(train_X, train_deaths, train_times), (test_X, test_deaths, test_times) =
IAI.split_data(:survival, X, deaths, times, seed=1)Prescription:
X = [1 2; 3 4; 5 6; 7 8]
treatments = ["A", "B", "A", "B"]
outcomes = [0.1, 0.2, 0.3, 0.4]
# or :prescription_maximize
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
IAI.split_data(:prescription_minimize, X, treatments, outcomes, seed=1)Learners
IAI.Learner — TypeAbstract type encompassing all learners.
Learners are further divided into two groups:
SupervisedLearnerfor supervised tasks, containing:ClassificationLearnerRegressionLearnerPrescriptionLearnerSurvivalLearnerRewardEstimationLearner
UnsupervisedLearnerfor unsupervised tasks, containing:ImputationLearner
Fitting
IAI.fit! — Methodfit!(lnr::Learner, X::FeatureInput, y::TargetInput...;
sample_weight::SampleWeightInput=nothing)Fits a model using the parameters in lnr and the data X and y.
Evaluation (for supervised learners only)
IAI.predict — Methodpredict(lnr::SupervisedLearner, X::FeatureInput)Return the predictions made by the trained model in lnr for each point in the data X.
IAI.score — Functionscore(lnr::SupervisedLearner, X::FeatureInput, y::TargetInput...;
keyword_arguments...)Calculates the score for lnr on data X and y. All scores are calibrated such that higher is better (and 1 is the maximum possible score).
Keyword Arguments
sample_weight::SampleWeightInput=nothing: the weighting to give to each data point.criterion=:default: the scoring criterion to use when evaluating the score (refer to the documentation on scoring criteria for more information). Uses the criterion inlnrif left as:default.- extra keyword arguments are passed through to configure the specified scoring criterion (e.g.
tweedie_variance_powerfor:tweedie)
Utilities
IAI.write_json — Functionwrite_json(f, obj; keyword_arguments...)Write obj (can be a Learner or GridSearch) to f in JSON format.
Keyword Arguments
indent=2: indent amount in JSON
IAI.read_json — Functionread_json(f)Read in a Learner or GridSearch saved in JSON format from f.
IAI.variable_importance — Methodvariable_importance(lnr::Learner)Generate a ranking of the variables in lnr according to their importance during training. The results are normalized so that they sum to one.
IAI.get_params — Functionget_params!(lnr::Learner)Return a Dict containing the values of user-specified parameters in lnr.
IAI.set_params! — Functionset_params!(lnr::Learner; params...)Update user-specified parameters in lnr with all supplied key-value pairs in params.
IAI.clone — Functionclone(lnr::Learner)Return an unfitted copy of lnr with the same user-specified parameters.
Visualization in Browser
IAI.AbstractVisualization — TypeAbstract type encompassing objects related to visualization. Examples include:
IAI.write_html — Methodwrite_html(f, vis::AbstractVisualization; keyword_arguments...)Generic function for saving a visualization vis to f in HTML format.
IAI.show_in_browser — Methodshow_in_browser(vis::AbstractVisualization; keyword_arguments...)Generic function for showing a visualization vis in the browser.
IAI.set_rich_output_param! — Functionset_rich_output_param!(key::Symbol, value)Sets the global rich output parameter key to value.
For a detailed list of parameters for rich outputs, see:
- IAITrees:
write_pngorwrite_html
IAI.get_rich_output_params — Functionget_rich_output_params()Return the current global rich output parameter settings.
IAI.delete_rich_output_param! — Functiondelete_rich_output_param!(key::Symbol)Delete the global rich output parameter key.
Grid Search and Parameter Validation
IAI.GridSearch — TypeGridSearch(lnr::Learner, param_grid)Controls grid search over parameter combinations in param_grid to find the best combination of parameters for lnr.
lnr is a learner with any parameters that should be included in all combinations of parameters tested.
param_grid contains the parameter ranges to search over. These can be supplied in multiple ways, which we demonstrate with examples that create identical GridSearchs to tune lnr over the parameters criterion and normalize_X:
one or more keyword arguments to the
GridSearchconstructor containingkey=valuepairs for all desired parameters and their ranges:IAI.GridSearch(lnr, criterion=[:gini, :entropy], normalize_X=[true, false])a
DictorNamedTuplewhere the keys are the names of the parameters to tune, and the corresponding values are the range over which to vary each parameter:IAI.GridSearch(lnr, Dict(:criterion => [:gini, :entropy], :normalize_X => [true, false]))IAI.GridSearch(lnr, (criterion=[:gini, :entropy], normalize_X=[true, false]))a
Vector{Dict}orVector{NamedTuple}where each entry specifies a grid of parameters to test (refer to the documentation on multiple parameter grids):IAI.GridSearch(lnr, [ (criterion=:gini, normalize_X=[true, false]), (criterion=:entropy, normalize_X=[true, false]), ])
IAI.fit! — Methodfit!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
keyword_arguments...)Fit a grid with data X and y... by randomly splitting the data into training and validation sets in the same way as split_data.
Keyword Arguments
train_proportion::Float64=0.7: the proportion of data used in trainingsample_weight::SampleWeightInput=nothing: the weighting to give to each data point.verbose::Bool=false: iftrue, prints out the score for each parameter combination during the grid search.validation_criterion::Symbol=:default: the scoring criterion that should be used to evaluate the parameter combinations to determine which is best (refer to the documentation on scoring criteria for more information). Uses the criterion in thelnrof contained ingridif left as:default.- extra keyword arguments are passed through to configure the specified scoring criterion (e.g.
tweedie_variance_powerfor:tweedie)
fit!(grid::GridSearch, train_X::FeatureInput, train_y::TargetInput...,
valid_X::FeatureInput, valid_y::TargetInput...; keyword_arguments...)Fit a grid with explicit training and validation sets.
Supports the same keyword arguments as above with the exception of train_proportion as the data has already been split. sample_weight additionally accepts a Tuple of sample weight vectors if you would like to specify explicit weight vectors for the training and validation sets.
IAI.fit_cv! — Functionfit_cv!(grid::GridSearch, X::FeatureInput, y::TargetInput...;
keyword_arguments...)Fit a grid with data X and y... using k-fold cross-validation.
The keyword arguments are the same as for fitting the grid with randomly split data using IAI.fit!, except the train_proportion argument is replaced by n_folds, which indicates the number of folds to use in the cross-validation (defaulting to 5).
IAI.get_learner — Functionget_learner(grid::GridSearch)Return the final fitted learner using the best parameter combination from the grid.
IAI.get_best_params — Functionget_best_params(grid::GridSearch)Return the best parameter combination from the grid.
Examples
Example output from a GridSearch used to tune an OptimalTreeClassifier:
IAI.get_best_params(grid)Dict{Symbol,Any} with 2 entries:
:cp => 0.0357143
:max_depth => 3IAI.get_grid_results — Functionget_grid_results(grid::GridSearch)Return a DataFrame summarizing the results from the grid search.
Each row corresponds to a single parameter combination from the grid search, and contains:
- the value of each parameter
- the training and validation scores of the learner trained using these parameters
- the rank of this parameter combination according to the validation score (where a rank of 1 indicates the best parameter combination)
When fitting the grid using cross-validation, the training and validation scores for each fold are shown, along with the mean and standard deviation of these scores.
Examples
Example output from a GridSearch used to tune an OptimalTreeClassifier:
IAI.get_grid_results(grid)3×5 DataFrame
│ Row │ max_depth │ cp │ train_score │ valid_score │ rank_valid_score │
│ │ Int64 │ Float64 │ Float64 │ Float64 │ Int64 │
├─────┼───────────┼───────────┼─────────────┼─────────────┼──────────────────┤
│ 1 │ 1 │ 0.25 │ 0.666667 │ 0.666667 │ 3 │
│ 2 │ 2 │ 0.228571 │ 0.971429 │ 0.911111 │ 2 │
│ 3 │ 3 │ 0.0357143 │ 0.980952 │ 0.915556 │ 1 │Task-specific Functions
These functions are only available to learners of the appropriate type for the problem.
Classification
IAI.predict_proba — Functionpredict_proba(lnr::ClassificationLearner, X::FeatureInput)Return the probabilities of class membership made by the trained model in lnr for each point in the features X.
IAI.ROCCurve — TypeContainer for ROC curve information with the following fields:
coords::Vector{Dict}: Vector ofDicts representing the points on the curve. EachDictcontains the following keys::fpr: false positive rate at the given threshold:tpr: true positive rate at the given threshold:threshold: the threshold
auc::Float64: the area-under-the-curve (AUC)
The resulting curve can be visualized in the browser using show_in_browser, or with write_html to save the visualization in HTML format.
IAI.ROCCurve — MethodROCCurve(probs::AbstractVector{<:Real}, y::AbstractVector, positive_label)Construct a ROCCurve from predicted probabilities probs and true labels y. It is required to specify one of the labels contained in y as the positive_label so that probs gives the predicted probability of being equal to positive_label for each sample.
Examples
Calculate AUC from predicted probabilities and true labels:
probs = [0.1, 0.8, 0.4, 0.3]
y = ["A", "B", "A", "B"]
ROCCurve(probs, y, positive_label="B").auc0.75IAI.ROCCurve — MethodROCCurve(lnr::ClassificationLearner, X::FeatureInput, y::AbstractVector)Construct a ROCCurve using trained lnr on the features X and labels y.
Can only be applied to classification problems with $K=2$ classes.
IAI.write_html — Methodwrite_html(f, roc::ROCCurve)Write interactive browser visualization of roc to f in HTML format.
IAI.show_in_browser — Methodshow_in_browser(roc::ROCCurve)Display an interactive visualization of roc in the browser.
Prescription
IAI.predict_outcomes — Methodpredict_outcomes(lnr::PrescriptionLearner, X::FeatureInput)Return a DataFrame containing the predicted outcome for each treatment option made by the trained model in lnr for each point in the features X.
Policy
IAI.predict_outcomes — Methodpredict_outcomes(lnr::PolicyLearner, X::FeatureInput, rewards::FeatureInput)Return the outcome from rewards for each point in the features X under the prescriptions made by the trained model in lnr.
Survival
IAI.SurvivalCurve — TypeContainer for survival curve information.
Use curve[t] to get the mortality probability prediction from curve at time t. This returns the cumulative distribution function evaluated at time t, i.e., the probability that death occurs at or before time t.
The data underlying the curve can be extracted with get_survival_curve_data.
IAI.get_survival_curve_data — Functionget_survival_curve_data(curve::SurvivalCurve)Extract the underlying data from curve as a Dict with two keys:
:times: the time for each breakpoint on the curve:coefs: the mortality probablility for each breakpoint on the curve
IAI.predict_hazard — Functionpredict_hazard(lnr::SurvivalLearner, X::FeatureInput)Return the fitted hazard coefficient estimate made by the trained model in lnr for each point in the data X. A higher hazard coefficient estimate corresponds to a smaller predicted survival time.
IAI.predict_expected_survival_time — Functionpredict_expected_survival_time(lnr::SurvivalLearner, X::FeatureInput)Return the expected time to survival made by the trained model in lnr for each point in the data X.
Imputation
IAI.transform — Functiontransform(lnr::ImputationLearner, X::FeatureInput)Return a DataFrame containing the features X with all missing values imputed by the fitted imputation model in lnr.
IAI.fit_transform! — Functionfit_transform!(lnr::ImputationLearner, X::FeatureInput; kwargs...)Fit lnr with an imputation model on features X and return a DataFrame containing the features X with all missing values imputed by lnr. Equivalent to calling fit!(lnr, X; kwargs...) followed by transform(lnr, X).
fit_transform!(grid::GridSearch, X::FeatureInput; kwargs...)As fit_transform! for an imputation learner, but performs validation over the grid parameters during training before returning the final imputed DataFrame.
fit_transform!(grid::GridSearch, train_X::FeatureInput,
valid_X::FeatureInput; kwargs...)As fit_transform! but performs validation with the pre-split training and validation sets train_X and valid_X.
IAI.fit_transform_cv! — Functionfit_transform_cv!(grid::GridSearch, X::FeatureInput; kwargs...)As fit_transform! for a grid search, but uses k-fold cross validation to determine the best parameter combination. Equivalent to calling fit_cv!(lnr, X; kwargs...) followed by transform(lnr, X).
Reward Estimation
IAI.predict — Methodpredict(lnr::RewardEstimationLearner, X::FeatureInput,
treatments::AbstractVector, outcomes::AbstractVector)Return counterfactual rewards estimated by lnr for each observation in the data given by X, treatments and outcomes.
IAI.fit_predict! — Functionfit_predict!(lnr::RewardEstimationLearner, X::FeatureInput,
treatments::AbstractVector, outcomes::AbstractVector;
kwargs...)Fit lnr with a reward estimation model on features X, treatments treatments, and outcomes outcomes, and return predicted counterfactual rewards for each observation.