Learning Optimal Pricing Policies for Grocery Stores

In this example, we aim to learn an optimal pricing policy for grocery stores from observational data. We use a publicly available retail dataset ("The Complete Journey" from dunnhumby) that contains household-level transactions over two years. We illustrate how Optimal Prescriptive Trees and Optimal Policy Trees can be used to inform interpretable pricing policies.

Why not just solve a revenue optimization model?

Traditionally pricing is done through a revenue optimization approach where models are built to predict demand as a function of price, and then an optimization model is solved to maximize the revenue and arrive at the optimal price. This approach generally has practical limitations: either the demand estimation is too broad and not personalized to a particular grocery store, or there is not enough data from a store to estimate demand. We will see from this example that these approaches for interpretable policy learning leverage data from all grocery stores and use intrinsic features such as customer demographics to cluster stores and make optimal pricing recommendations. The resulting policy does not rely on the availability of demand information for a particular store, and moreover will give meaningful reasons for each pricing recommendation.

Preparing the dataset

A previous study used the same dataset with a greedy, tree-based approach and found a 67% increase in revenue for strawberries. We are interested in evaluating our Optimal Trees-based approaches to see if they provide an additional lift.

We focus on strawberries as the item of interest, and follow the same data preparation process as this paper. We process the data so that each row of the resulting dataset is a shopping trip, with household characteristics, the price of the item, and whether that item was purchased as the outcome.

For brevity, we omit the details of this data processing, and instead start from the prepared dataset. For full reproducibility, the detailed data preparation script is available here.

First, we load the prepared data and convert the ordinal and mixed columns:

using CSV, DataFrames
using CategoricalArrays
using Statistics

df = CSV.read("grocery_pricing.csv", DataFrame)
variable_dict = Dict(
    :AGE_DESC => ["19-24", "25-34", "35-44", "45-54", "55-64", "65+"],
    :MARITAL_STATUS_CODE => [],
    :INCOME_DESC => ["Under 15K", "15-24K", "25-34K", "35-49K", "50-74K",
                     "75-99K", "100-124K", "125-149K", "150-174K", "175-199K",
                     "200-249K", "250K+"],
    :HOMEOWNER_DESC => [],
    :HH_COMP_DESC => [],
    :HOUSEHOLD_SIZE_DESC => ["1", "2", "3", "4", "5+"],
    :KID_CATEGORY_DESC => ["1", "2", "3+"]
)

for (var, levels) in variable_dict
  if var == :KID_CATEGORY_DESC
    df[!, var] = IAI.make_mixed_data(df[!, var], levels)
  elseif isempty(levels)
    df[!, var] = categorical(df[!, var])
  else
    df[!, var] = categorical(df[!, var], ordered=true, levels=levels)
  end
end
df = df[completecases(df), :]

97295×13 DataFrame
   Row │ household_key  BASKET_ID    DAY    price    outcome  AGE_DESC  MARITA ⋯
       │ Int64          Int64        Int64  Float64  Int64    Cat…      Catego ⋯
───────┼────────────────────────────────────────────────────────────────────────
     1 │           216  27008817920      3     2.99        0  35-44     U      ⋯
     2 │          2324  27008841762      3     2.99        0  35-44     A
     3 │          2324  27008841880      3     2.99        0  35-44     A
     4 │          2305  27008850617      3     2.99        0  45-54     B
     5 │          2110  27009082349      3     2.99        1  35-44     A      ⋯
     6 │           432  27009271101      3     2.99        0  19-24     U
     7 │           304  27009304297      3     2.99        0  25-34     U
     8 │          1929  27021022215      4     2.99        0  35-44     B
   ⋮   │       ⋮             ⋮         ⋮       ⋮        ⋮        ⋮             ⋱
 97289 │          1823  42289907120    711     2.99        0  45-54     A      ⋯
 97290 │          1627  42289907311    711     2.99        0  25-34     U
 97291 │           371  42289910739    711     2.99        1  35-44     A
 97292 │           647  42289919750    711     2.99        0  35-44     A
 97293 │           647  42289919785    711     2.99        0  35-44     A      ⋯
 97294 │           761  42289921056    711     2.99        0  25-34     A
 97295 │          1369  42302712189    711     2.99        0  25-34     B
                                                7 columns and 97280 rows omitted

Next, we create the features, treatments (prices), and outcomes (revenue, which is prices times whether the product was purchased or not). We then separate the data into training and testing.

seed = 1
X = df[:, collect(keys(variable_dict))]
y = df.outcome
t = df.price
(train_X, train_t, train_y), (test_X, test_t, test_y) = IAI.split_data(
    :policy_maximize, X, t, y, train_proportion=.5, seed=seed)

Optimal Prescriptive Tree approach

We first take an Optimal Prescriptive Tree approach. Optimal Prescriptive Trees take in the features, treatments, and outcomes to learn the best prescription to maximize revenue. The trees automatically estimate what would have happened if a different price were assigned, so we do not need to worry about explicitly estimating these counterfactual outcomes.

We train the prescriptive tree with the prices discretized into 50-cent increments in the same fashion as the paper:

prescriptive_grid = IAI.GridSearch(
    IAI.OptimalTreePrescriptionMaximizer(
      random_seed=seed,
    ),
    max_depth=1:6,
)
train_y_revenue = train_y .* train_t
train_t_discrete = round.(train_t .* 2, digits=0) ./ 2
IAI.fit!(prescriptive_grid, train_X, train_t_discrete, train_y_revenue)

Optimal Trees Visualization

This tree suggests that depending on the income levels (INCOME_DESC), marital status (MARITAL_STATUS_CODE), househouse composition (HH_COMP_DESC) and size (HOUSEHOLD_SIZE_DESC), homeownership (HOMEOWNER_DESC), and age (AGE_DESC), different prices should be assigned. For example, node 50 prescribes the highest price of 5 to single males aged above 35 and income above 150k, whereas node 18 prescribes the lowest price of 2 to households with size at least 4 (two adults with kids) with income below 124k.

This is consistent with intuition, as those with higher income and less family burden are probably even less price sensitive, so we can increase the price without reducing the purchase probability much in this group.

Optimal Policy Tree approach

In contrast to Optimal Prescriptive Trees, Optimal Policy Trees separate the tasks of estimating the counterfactuals and learning the prescription policy. This means that in order to train the tree, we require the input of a reward matrix - that is, the outcome under each of the possible policies for each sample in the data. This is particularly useful when the outcome is a more complex function of the features and treatments, as the intrinsic estimation in Optimal Prescriptive Trees may struggle to model the outcomes correctly. We will compare these two approaches to see if this is indeed the case.

The first step is to create the reward matrix. We do so by using a NumericRewardEstimator to predict the purchase probability at our candidate prices. We can then multiply each probability by the corresponding price to get the expected revenue for each shopping trip under each candidate price:

t_options = [2.0, 2.5, 3.0, 3.5, 4.0, 5.0]
reward_lnr = IAI.NumericRewardEstimator(
    outcome_estimator=IAI.XGBoostClassifier(),
    random_seed=seed,
)

function get_rewards(reward_lnr, X, t, y, t_options)
  rewards, score = IAI.fit_predict!(reward_lnr, X, t, y, t_options,
                                    criterion=:auc)
  for t in t_options
    rewards[!, Symbol(t)] = round.(rewards[!, Symbol(t)] .* t, digits=4)
  end
  rewards, score
end
train_rewards, train_reward_score = get_rewards(reward_lnr, train_X, train_t,
                                                train_y, t_options)
train_rewards

48648×6 DataFrame
   Row │ 2.0      2.5      3.0      3.5      4.0      5.0
       │ Float64  Float64  Float64  Float64  Float64  Float64
───────┼──────────────────────────────────────────────────────
     1 │  0.1692   0.1542   0.0559   0.0174   0.0567   0.0222
     2 │  0.1458   0.1036   0.0837   0.039    0.0498   0.0758
     3 │  0.3486   0.1991   0.0746   0.0006   0.0007   0.003
     4 │  0.0862   0.1391   0.0175   0.0006   0.0008   0.0004
     5 │  0.0172   0.0193   0.0145   0.0186   0.0242   0.0753
     6 │  0.0862   0.1391   0.0175   0.0006   0.0008   0.0004
     7 │  0.0797   0.0922   0.0861   0.0236   0.0397   0.0459
     8 │  0.0237   0.0179   0.0209   0.0073   0.0058   0.0005
   ⋮   │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮
 48642 │  0.4508   0.4439   0.3038   0.6919   0.1856   0.0519
 48643 │  0.058    0.0543   0.0399   0.0117   0.0218   0.0056
 48644 │  0.0333   0.0742   0.0345   0.0084   0.0127   0.0841
 48645 │  0.2412   0.2318   0.1404   0.1375   0.3055   0.0221
 48646 │  0.0166   0.0086   0.0113   0.0133   0.0352   0.0302
 48647 │  0.1303   0.2131   0.0663   0.0253   0.0029   0.0079
 48648 │  0.3613   0.1173   0.1549   0.0138   0.0063   0.0126
                                            48633 rows omitted

train_reward_score

0.7631632887182074

We see that the reward estimation has an internal AUC of 76%, giving us confidence that we can trust these rewards for training.

With the reward matrix as the input in addition to the features, we can now learn an Optimal Policy Tree:

optimal_policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
      random_seed=seed,
    ),
    max_depth=1:6,
)
IAI.fit!(optimal_policy_grid, train_X, train_rewards)

Optimal Trees Visualization

The variables used in the Optimal Policy Tree are similar to those from the prescriptive tree, with age (AGE_DESC), income (INCOME_DESC), household size (HOUSEHOLD_SIZE_DESC), home ownership (HOMEOWNER_DESC) as well as others. It also has multiple paths to prescribing a price of 5. For example, Node 61 is defined by the following:

two unmarried adults with no children
income above 150k
age 34 or below

To compare performance with the original paper, we also fit a greedy policy tree:

greedy_policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
      localsearch=false,
      random_seed=seed,
    ),
    max_depth=1:6,
)
IAI.fit!(greedy_policy_grid, train_X, train_rewards)

Evaluation in the Test Set

Since we do not know the true outcome under each price in the test set, we need another procedure for estimation so we can evaluate in the test set. We use the same procedure as before to estimate the revenue under each candidate price, but now on the test set:

test_rewards, test_reward_score = get_rewards(reward_lnr, test_X, test_t,
                                              test_y, t_options)
test_rewards

48647×6 DataFrame
   Row │ 2.0      2.5      3.0      3.5      4.0      5.0
       │ Float64  Float64  Float64  Float64  Float64  Float64
───────┼──────────────────────────────────────────────────────
     1 │  0.187    0.1467   0.0777   0.0913   0.0261   0.0081
     2 │  0.0813   0.052    0.0379   0.045    0.0158   0.0172
     3 │  0.1785   0.1641   0.1181   0.0726   0.4065   0.3926
     4 │  0.0831   0.0391   0.0165   0.0037   0.0009   0.0004
     5 │  0.0974   0.0344   0.0154   0.0103   0.003    0.001
     6 │  0.117    0.0923   0.033    0.0338   0.0754   0.1254
     7 │  0.0849   0.0774   0.0211   0.0314   0.0111   0.055
     8 │  0.0639   0.0454   0.0189   0.045    0.0331   0.2014
   ⋮   │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮
 48641 │  0.0103   0.0362   0.035    0.0577   0.0101   0.0094
 48642 │  0.0038   0.0021   0.0022   0.0004   0.0019   0.004
 48643 │  0.0117   0.0259   0.0053   0.0091   0.0014   0.0005
 48644 │  0.0858   0.0454   0.0401   0.007    0.0015   0.0117
 48645 │  0.1313   0.0843   0.0594   0.0104   0.0195   0.0222
 48646 │  0.0625   0.0614   0.0373   0.0464   0.0283   0.083
 48647 │  0.1515   0.1367   0.0978   0.0306   0.0172   0.0177
                                            48632 rows omitted

test_reward_score

0.7606772786309265

Again, we see that the reward estimator has a high AUC of 76%, indicating the rewards will serve as a good basis for evaluating the quality of our prescriptions.

We evaluate the methods by calculating the predicted percentage increase in revenue under the new pricing strategies:

function evaluate(recommendations, outcomes, actual_revenue)
  n = length(recommendations)
  pred_revenue = [outcomes[i, recommendations[i]] for i in 1:n]

  improvement = mean(pred_revenue .- actual_revenue) / mean(actual_revenue)
end

# Get revenue observed for test set in reality
test_revenue = test_y .* test_t

For Optimal Prescriptive Trees:

recommendations_pres = Symbol.(IAI.predict(prescriptive_grid, test_X)[1])
evaluate(recommendations_pres, test_rewards, test_revenue)

0.64291579435594

For Optimal Policy Trees:

recommendations_optimal_policy = IAI.predict(optimal_policy_grid, test_X)
evaluate(recommendations_optimal_policy, test_rewards, test_revenue)

0.73472881774237

For greedy policy trees:

recommendations_greedy_policy = IAI.predict(greedy_policy_grid, test_X)
evaluate(recommendations_greedy_policy, test_rewards, test_revenue)

0.6803592430896

We can see that Optimal Policy Trees have the best performance. The 73% improvement in revenue is significantly higher than Optimal Prescriptive Trees at 64%. This suggests that separating the reward estimation and policy learning tasks provides an edge when faced with outcomes that are a complex function of a numeric treatment.

Optimal Policy Trees also improve over the greedy approach by more than 5%, demonstrating the significant performance gains we can achieve by training the tree with a view to global optimality.