# Learning Optimal Pricing Policies for Grocery Stores

In this example, we aim to learn an optimal pricing policy for grocery stores from observational data. We use a publicly available retail dataset ("The Complete Journey" from dunnhumby) that contains household-level transactions over two years. We illustrate how Optimal Prescriptive Trees and Optimal Policy Trees can be used to inform interpretable pricing policies.

*Why not just solve a revenue optimization model?*

Traditionally pricing is done through a revenue optimization approach where models are built to predict demand as a function of price, and then an optimization model is solved to maximize the revenue and arrive at the optimal price. This approach generally has practical limitations: either the demand estimation is too broad and not personalized to a particular grocery store, or there is not enough data from a store to estimate demand. We will see from this example that these approaches for interpretable policy learning leverage data from all grocery stores and use intrinsic features such as customer demographics to cluster stores and make optimal pricing recommendations. The resulting policy does not rely on the availability of demand information for a particular store, and moreover will give meaningful reasons for each pricing recommendation.

## Preparing the dataset

A previous study used the same dataset with a greedy, tree-based approach and found a 67% increase in revenue for strawberries. We are interested in evaluating our Optimal Trees-based approaches to see if they provide an additional lift.

We focus on strawberries as the item of interest, and follow the same data preparation process as this paper. We process the data so that each row of the resulting dataset is a shopping trip, with household characteristics, the price of the item, and whether that item was purchased as the outcome.

For brevity, we omit the details of this data processing, and instead start from the prepared dataset. For full reproducibility, the detailed data preparation script is available here.

First, we load the prepared data and convert the ordinal and mixed columns:

```
using CSV, DataFrames
using CategoricalArrays
using Statistics
df = CSV.read("grocery_pricing.csv", DataFrame)
variable_dict = Dict(
:AGE_DESC => ["19-24", "25-34", "35-44", "45-54", "55-64", "65+"],
:MARITAL_STATUS_CODE => [],
:INCOME_DESC => ["Under 15K", "15-24K", "25-34K", "35-49K", "50-74K",
"75-99K", "100-124K", "125-149K", "150-174K", "175-199K",
"200-249K", "250K+"],
:HOMEOWNER_DESC => [],
:HH_COMP_DESC => [],
:HOUSEHOLD_SIZE_DESC => ["1", "2", "3", "4", "5+"],
:KID_CATEGORY_DESC => ["1", "2", "3+"]
)
for (var, levels) in variable_dict
if var == :KID_CATEGORY_DESC
df[!, var] = IAI.make_mixed_data(df[!, var], levels)
elseif isempty(levels)
df[!, var] = categorical(df[!, var])
else
df[!, var] = categorical(df[!, var], ordered=true, levels=levels)
end
end
df = df[completecases(df), :]
```

```
97295×13 DataFrame
Row │ household_key BASKET_ID DAY price outcome AGE_DESC MARITA ⋯
│ Int64 Int64 Int64 Float64 Int64 Cat… Cat… ⋯
───────┼────────────────────────────────────────────────────────────────────────
1 │ 216 27008817920 3 2.99 0 35-44 U ⋯
2 │ 2324 27008841762 3 2.99 0 35-44 A
3 │ 2324 27008841880 3 2.99 0 35-44 A
4 │ 2305 27008850617 3 2.99 0 45-54 B
5 │ 2110 27009082349 3 2.99 1 35-44 A ⋯
6 │ 432 27009271101 3 2.99 0 19-24 U
7 │ 304 27009304297 3 2.99 0 25-34 U
8 │ 1929 27021022215 4 2.99 0 35-44 B
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
97289 │ 1823 42289907120 711 2.99 0 45-54 A ⋯
97290 │ 1627 42289907311 711 2.99 0 25-34 U
97291 │ 371 42289910739 711 2.99 1 35-44 A
97292 │ 647 42289919750 711 2.99 0 35-44 A
97293 │ 647 42289919785 711 2.99 0 35-44 A ⋯
97294 │ 761 42289921056 711 2.99 0 25-34 A
97295 │ 1369 42302712189 711 2.99 0 25-34 B
7 columns and 97280 rows omitted
```

Next, we create the features, treatments (prices), and outcomes (revenue, which is prices times whether the product was purchased or not). We then separate the data into training and testing.

```
seed = 12345
X = df[:, collect(keys(variable_dict))]
y = df.outcome
t = df.price
(train_X, train_t, train_y), (test_X, test_t, test_y) = IAI.split_data(
:policy_maximize, X, t, y, train_proportion=0.5, seed=seed)
```

## Optimal Prescriptive Tree approach

We first take an Optimal Prescriptive Tree approach. Optimal Prescriptive Trees take in the features, treatments, and outcomes to learn the best prescription to maximize revenue. The trees automatically estimate what would have happened if a different price were assigned, so we do not need to worry about explicitly estimating these counterfactual outcomes.

We train the prescriptive tree with the prices discretized into 50-cent increments in the same fashion as the paper:

```
prescriptive_grid = IAI.GridSearch(
IAI.OptimalTreePrescriptionMaximizer(
random_seed=seed,
),
max_depth=1:6,
)
train_y_revenue = train_y .* train_t
train_t_discrete = round.(train_t .* 2, digits=0) ./ 2
IAI.fit!(prescriptive_grid, train_X, train_t_discrete, train_y_revenue)
```

This tree suggests that depending on homeowner status (`HOMEOWNER_DESC`

), income levels (`INCOME_DESC`

), age (`AGE_DESC`

), marital status (`MARITAL_STATUS_CODE`

), etc., different prices should be assigned. For example, node 10 prescribes the highest price of 5 to homeowners, income between 150-200k, two adults, no kids or single female households with age less than 35, whereas node 4 prescribes the lowest price of 2 to homeowners with income less than 125k.

This seems consistent with intuition, as homeowners with higher income and less family burden are probably even less price sensitive, so we can increase the price without reducing the purchase probability much in this group.

## Optimal Policy Tree approach

In contrast to Optimal Prescriptive Trees, Optimal Policy Trees separate the tasks of estimating the counterfactuals and learning the prescription policy. This means that in order to train the tree, we require the input of a reward matrix - that is, the outcome under each of the possible policies for each sample in the data. This is particularly useful when the outcome is a more complex function of the features and treatments, as the intrinsic estimation in Optimal Prescriptive Trees may struggle to model the outcomes correctly. We will compare these two approaches to see if this is indeed the case.

The first step is to create the reward matrix. We do so by using a `NumericClassificationRewardEstimator`

to predict the purchase probability at our candidate prices. Since we do not observe significant correlation between price and the other features, we use the direct method as the reward estimation method. We can then multiply each probability by the corresponding price to get the expected revenue for each shopping trip under each candidate price:

```
t_options = [2.0, 2.5, 3.0, 3.5, 4.0, 5.0]
reward_lnr = IAI.NumericClassificationRewardEstimator(
outcome_estimator=IAI.XGBoostClassifier(num_round=10),
outcome_insample_num_folds=2,
reward_estimator=:direct_method,
estimation_kernel_bandwidth=1,
random_seed=seed,
)
function get_rewards(reward_lnr, X, t, y, t_options)
predictions, score = IAI.fit_predict!(reward_lnr, X, t, y, t_options,
outcome_score_criterion=:auc)
rewards = predictions[:reward]
for t in t_options
rewards[!, Symbol(t)] = round.(rewards[!, Symbol(t)] .* t, digits=3)
end
rewards, score
end
train_rewards, train_reward_score = get_rewards(reward_lnr, train_X, train_t,
train_y, t_options)
train_rewards
```

```
48648×6 DataFrame
Row │ 2.0 2.5 3.0 3.5 4.0 5.0
│ Float64 Float64 Float64 Float64 Float64 Float64
───────┼──────────────────────────────────────────────────────
1 │ 0.128 0.199 0.168 0.13 0.104 0.129
2 │ 0.141 0.31 0.139 0.098 0.144 0.178
3 │ 0.119 0.172 0.123 0.144 0.157 0.116
4 │ 0.108 0.136 0.076 0.106 0.104 0.164
5 │ 0.135 0.104 0.099 0.091 0.104 0.127
6 │ 0.14 0.212 0.105 0.134 0.104 0.404
7 │ 0.104 0.179 0.117 0.082 0.107 0.116
8 │ 0.123 0.199 0.147 0.159 0.104 0.14
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
48642 │ 0.084 0.212 0.105 0.113 0.104 0.127
48643 │ 0.13 0.136 0.117 0.086 0.104 0.127
48644 │ 0.119 0.118 0.078 0.103 0.122 0.116
48645 │ 0.162 0.206 0.126 0.084 0.12 0.116
48646 │ 0.135 0.115 0.127 0.102 0.17 0.203
48647 │ 0.147 0.199 0.109 0.089 0.234 0.129
48648 │ 0.054 0.107 0.168 0.159 0.104 0.355
48633 rows omitted
```

`train_reward_score[:outcome]`

```
Dict{String, Float64} with 6 entries:
"2.0" => 0.62459
"2.5" => 0.651953
"3.0" => 0.691372
"3.5" => 0.746921
"4.0" => 0.708353
"5.0" => 0.586203
```

We see that the reward estimation has internal AUCs between 63-68%, giving us confidence that we can trust these rewards for training.

With the reward matrix as the input in addition to the features, we can now learn an Optimal Policy Tree:

```
optimal_policy_grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=seed,
),
max_depth=1:6,
)
IAI.fit!(optimal_policy_grid, train_X, train_rewards)
```