Advanced

Parallelization

Many learners support training in parallel across multiple threads, processes or even machines using the parallelism built into Julia.

In order to parallelize the training process, you will need to choose to parallelize either over multiple threads or over multiple processes - these approaches cannot be used simultaneously.

Tip

We recommend using multiple threads for best performance as the overhead is significantly lower than for multiple processes

Parallelization with multiple threads

You can start Julia with multiple threads in two ways:

  • specify the number of threads using the environment variable JULIA_NUM_THREADS (i.e. setting JULIA_NUM_THREADS=8 will use 8 threads)
  • Julia 1.5+ only: specify the number of threads using the -t/--threads flag when starting Julia (i.e. starting Julia with julia -t 8 will use 8 threads)

If using Jupyter, you can create a separate kernel that uses a specific number of threads:

using IJulia
installkernel("Julia (IAI, 8 threads)", "--sysimage=path/to/sys.dylib",
              env=Dict("JULIA_NUM_THREADS" => "8"))
Warning

You can only set the number of Julia threads when starting Julia - once Julia is running it is not possible to increase the number of threads

The parallelism of the learner fitting algorithm can be controlled via the num_threads parameter on the learner (see Parameters). This parameter is an Integer specifying the number of threads to use. All threads will be used by default, but a smaller number can be specified if desired.

If using the R or Python interface, you will need to specify the number of threads by setting the JULIA_NUM_THREADS environment variable before initializing the IAI interface:

  • For R:

    Sys.setenv(JULIA_NUM_THREADS=8)
    iai::iai_setup()
  • For Python:

    import os
    os.environ['JULIA_NUM_THREADS'] = '8'
    from interpretableai import iai

Parallelization with multiple processes

You can start Julia with extra worker processes using the -p/--procs flag when running Julia from the terminal. For example, the following shell command will start Julia with three additional processes, for a total of four.

bash$ julia -p 3

You can also add additional worker processes to an existing Julia session using the addprocs function. The following Julia code adds three additional processes for a total of four:

using Distributed
addprocs(3)

The parallelism of the learner fitting algorithm can be controlled via the parallel_processes parameter on the learner (see Parameters). There are two options for specifying this parameter:

  • nothing will use all available processes during training
  • Specify a Vector containing the IDs of the processes to use during training. This needs to be a subset of the available processes, which can be found by running Distributed.procs().

If using the R or Python interface, you can add additional Julia processes using the iai::add_julia_processes (R) or iai.add_julia_processes (Python) functions.

Rich Multimedia Output Control

There are many learners and other objects that take advantage of Julia's rich multimedia output to produce interactive browser visualizations in Jupyter notebooks. Because these displays happen automatically, there is no opportunity to pass any desired keyword arguments to these display functions. If you would like to customize these visualizations with keyword arguments, you can instead use the set_rich_output_param! to specify the argument, which will be passed to the display function when automatically called.