Advanced

Visualization

There are a number of advanced features that are available when creating interactive browswer visualizations of tree learners. To access this additional functionality, use the TreePlot and Questionnaire objects for tree visualizations and questionnaires, respectively.

Each of these objects takes a tree learner as the first argument. Keyword arguments are used to control the additional functionality, as described below. Use write_html or show_in_browser to save or view the visualization. In a Jupyter notebook, the resulting visualization is displayed automatically.

Most of this additional functionality applies to the static images using write_png or write_dot as well.

Changing Names

The following keyword arguments enable you to rename various aspects of the data:

  • feature_renames to rename the features in the data, where the keys are the old feature names and the values are the corresponding new feature names:

    Dict("old_feature_1" => "new_feature_1", "old_feature_2" => "new_feature_2")
  • level_renames to rename the categoric/ordinal levels in the data, where the keys are the feature names and each value is a Dict for this feature where the keys are the old level names and the values are the new level names:

    Dict("feature_1" => Dict("old_level_1" => "new_level_1"),
         "feature_2" => Dict("old_level_1" => "new_level_1",
                             "old_level_2" => "new_level_2"))
  • label_renames to rename the labels of the target for classification and prescription problems, where the keys are the old label names and the values are the new label names:

    Dict("old_label_1" => "new_label_1", "old_label_2" => "new_label_2")

In the following example, we use feature_renames to replace some feature codes in the data with more descriptive names:

vis_renamed_features = IAI.TreePlot(lnr, feature_renames=Dict(
    "Disp" => "Displacement",
    "HP" => "Horsepower",
    "WT" => "Weight",
))
Optimal Trees Visualization

Controlling the Visualization Content

The extra_content keyword argument allows you to add or remove information for each node of the tree in the visualization. To do this, simply construct a vector with one Dict for each node of the tree containing the parameters you would like to control for this node.

Each Dict can contain any of the following fields:

  • :node_color: color in hex format for the node (e.g., #ACACAC). Only applies to tree visualizations.

  • :node_summary_extra: the extra HTML content to display as the node summary. For tree visualizations, this is the content in the node; for questionnaires, this is displayed above the question.

  • :node_details_extra: the extra HTML content to display as the node details. For tree visualizations, this is the tooltip content; for questionnaires, this is the content on the bottom of each question.

  • :node_details_include_default: whether to include the default node details. Defaults to true.

  • :node_summary_include_default: whether to include the default node summary. Defaults to true.

  • :node_split_extra: the extra content to display alongside the split variable below the node.

  • :node_criterion_extra: the extra content to display alongside the split criterion above the node.

  • :node_split_include_default: whether to include the default split variable below the node. Defaults to true.

  • :node_criterion_include_default: whether to include the default split criterion above the node. Defaults to true.

For example, the following code calculates the mean feature value for all the points that fall into each node of the tree and stores in as a vector of Dicts of :node_details_extra:

using Statistics
node_inds = IAI.apply_nodes(lnr, X)
extras = map(node_inds) do inds
  Dict(:node_details_extra =>
       string("<b>Mean horsepower in node:</b> ",
              round(mean(X[inds, :HP]), digits=1)))
end
7-element Vector{Dict{Symbol, String}}:
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 146.7")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 72.4")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 160.4")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 129.6")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 105.2")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 154.1")
 Dict(:node_details_extra => "<b>Mean horsepower in node:</b> 248.4")

We can then include this information in the visualization by passing the extra_content keyword argument to any of the visualization functions:

vis_extra_text = IAI.TreePlot(lnr, extra_content=extras)
Optimal Trees Visualization

You can incorporate additional information using any combination of HTML/CSS/JavaScript. Note that if you include any <script> tags, the closing tag must be terminated with <\/script> for the code to function correctly.

For example, the following code uses billboard.js to visualize the split threshold at each split node in the tree:

node_inds = IAI.apply_nodes(lnr, X)
extras = map(enumerate(node_inds)) do (t, inds)
  IAI.is_leaf(lnr, t) && return ""

  feature = IAI.get_split_feature(lnr, t)
  threshold = round(IAI.get_split_threshold(lnr, t), digits=2)

  Dict(:node_details_extra =>
  """
  <div id="node-plot-$t" style="width: 280px;"></div>
  <script>
    var chart = bb.generate({
      data: {
        xs: {
          y: "x"
        },
        columns: [
          ["x", $(join(X[inds, feature], ","))],
          ["y", $(join(y[inds], ","))],
        ],
        type: "scatter",
      },
      axis: {
        x: {
          label: "$feature",
          tick: {
            count: 3,
            format: function(x) { return x.toFixed(2); }
          }
        },
      },
      grid: {
        x: {
          lines: [
            {
              value: $threshold,
              text: "$feature < $threshold"
            }
          ]
        }
      },
      legend: {
        show: false
      },
      bindto: "#node-plot-$t"
    });
  <\\/script>
  """)
end

vis_extra_plots = IAI.TreePlot(lnr, extra_content=extras)
Optimal Trees Visualization

Advanced Multi-learner Visualization

It is possible to use the advanced visualization controls in conjunction with multi-learner visualizations. To do this, simply pass TreePlot or Questionnaire objects with the appropriate options instead of learners when constructing the visualization. This functionality does not apply to static images.

For example, the following visualization combines the earlier examples with renamed features and extra outputs:

questions = ("Use learner with" => [
    "renamed features" => vis_renamed_features,
    "extra text output" => vis_extra_text,
    "extra plots" => vis_extra_plots,
])
IAI.MultiTreePlot(questions)
Optimal Trees Visualization

Advanced Questionnaire Visualization

Offering binary choices rather than entering values

Sometimes it is a simpler experience for the end user to pick between the two splits of the tree rather than entering a raw value. The binary_choice_features parameter can be used to specify a subset of features in the questionnaire that should be presented in this way, and should be passed as a FeatureSet. For instance, the following code converts all features in the questionnaire to this binary choice format:

using DataFrames
IAI.Questionnaire(lnr, binary_choice_features=All())

Changing the "Not sure" button

The missing_renames parameter allows you to change the text on the "Not sure" button used to indicate a missing value. This allows you to improve the user experience when a value of missing has a different value in the context of a specific feature. missing_renames accepts a FeatureMapping that maps features to the desired text on the button. For instance, supposing that missing was used to indicate "First visit" for a given feature, the following code changes the label to reflect this:

IAI.Questionnaire(lnr, missing_renames=Dict("Disp" => "First visit"),
                  include_not_sure_buttons=true)

Choice of multiple units

Often it is helpful to offer the choice of multiple units when entering answers, as each user may have a different unit preference. The units parameter accepts a FeatureMapping that maps features to a set of units. Each unit contains two values: a name and a scaling factor to apply to the answer when this unit is selected.

For example, suppose that our feature represents speed, and the data we have was measured in km/hr. If we also want to allow the user to enter the speed in miles/hr, we can use the following code:

using DataFrames
IAI.Questionnaire(lnr, units=Dict(
    "Disp" => [("km/h", 1), ("mi/h", 1.60934)],
))

Now, if the user selects miles/hr, a scaling factor of 1.60934 is automatically applied to their answer to convert it back to km/hr before passing the answer to the tree logic.

Showing the logic of the decision tree

It is possible to include the logic of the decision tree splits between each question by using the show_tree_logic parameter:

IAI.Questionnaire(lnr, show_tree_logic=true)

It is also possible to control the naming of the features, levels, and missing values in the logic display by passing show_tree_logic as a dictionary:

IAI.Questionnaire(lnr, include_not_sure_buttons=true, show_tree_logic=Dict(
    "feature_renames" => Dict("Disp" => "new feature name"),
    "missing_renames" => Dict("Disp" => "new missing name"),
))

The possible keys for the dictionary are: