# Policy Learning to Reduce Churn

Reducing customer churn is a key topic of interest in digital marketing, as it can have a major impact on revenue and profitability. There are a variety of factors that lead to customers choosing to discontinue their subscription or usage of a product, and a one-size-fits-all approach such as a generic discount across the board can leave a lot on the table. To increase retention, it is important to learn and understand the different patterns in preferences and price sensitivity across various groups of customers.

A very common approach to churn reduction is building a predictive model that outputs the probability of churning in a given time period, and suggesting interventions for those with the highest probability of churning, such as direct marketing or offering lower prices. There is a critical flaw to this approach: when selecting customers for intervention, we did not consider the estimated impact of the intervention on their churn probability. To maximize the efficiency of our interventions, we should select those customers where the intervention has the largest impact on the churn probability. Therefore, it is critical to know the *counterfactual outcomes* and design an intervention policy that maximizes the hypothetical return under limited resources, rather than purely basing the strategy on the current churn probability.

Moreover, customer retention is not just a single-period problem. The customers are continually re-evaluating their decision to churn or not over time, and so this is really a dynamic problem that aims to optimize some long-term outcome. The classical approach of modeling churn as a single probability limits the ability to react to different customer behaviors and preferences over time.

In this case study, we look at a publicly-available churn dataset Telco Customer Churn and showcase how to use Reward Estimation, Optimal Policy Trees, and Survival Analysis to address these issues and provide practical guidance for data-driven churn reduction. For demonstration, we look at developing personalized pricing policies for two different goals:

- maximizing the expected customer lifetime
- maximizing the probability retaining a customer over a given time period

We will discuss in the end how to extend the analysis to more complex and realistic outcomes such as Customer Lifetime Value (CLV).

## Data Preparation

We first read in the data and prepare the different data types. We model it as a survival problem, where the variable `MonthlyCharges`

is considered the treatment, `tenure`

represents the survival time, the variable `Churn = Yes`

is used as censoring variable, and the remaining variables are the features `X`

.

```
using CSV, DataFrames, CategoricalArrays
df = CSV.read("Telco-customer-churn.csv", DataFrame, pool=true)
replace!(df.MultipleLines, "No phone service" => "0",
"No" => "1",
"Yes" => "2+")
df[!, :NumPhoneLines] = CategoricalArray(df.MultipleLines, ordered=true,
levels=["0", "1", "2+"])
select!(df, Not(:MultipleLines))
X = select(df, Not([:customerID, :tenure, :TotalCharges, :MonthlyCharges, :Churn]))
treatments = df.MonthlyCharges
y = df.Churn .== "Yes"
times = df.tenure .+ 1
```

We then split the data into 50/50 training and testing sets, to ensure we have enough data for testing to have high-quality reward estimation on the test set:

```
seed = 123
(X_train, treatments_train, y_train, times_train), (X_test, treatments_test, y_test, times_test) = IAI.split_data(
:prescription_maximize, X, treatments, y, times, seed=seed, train_proportion=0.5
)
```

## 1. Maximizing Expected Customer Lifetime

First, we consider the task of designing a pricing policy to maximize customer lifetime. We will use a Reward Estimation Survival Learner to estimate the counterfactual outcomes under different prices. We consider monthly charges between 30 and 110 with an increment of 20 as the treatment options, as they largely cover the range of charges we see in the data and have enough resolution for us to differentiate the effect of different prices.

```
reward_lnr = IAI.NumericSurvivalRewardEstimator(
propensity_estimator=IAI.RandomForestRegressor(),
outcome_estimator=IAI.RandomForestSurvivalLearner(),
reward_estimator=:doubly_robust,
random_seed=1,
)
treatment_options = 30:20:110
train_predictions, train_reward_score = IAI.fit_predict!(
reward_lnr, X_train, treatments_train, y_train, times_train,
treatment_options,
outcome_score_criterion=:harrell_c_statistic)
train_rewards = train_predictions[:reward]
```

### Training Optimal Policy Tree

With the reward matrix, we can now train an Optimal Policy Tree that segments the customers into different cohorts, each with the optimal pricing strategy to maximize the counterfactual customer lifetime.

```
policy_grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=seed,
),
max_depth=2:5,
)
IAI.fit!(policy_grid, X_train, train_rewards)
IAI.get_learner(policy_grid)
```

We see that the trained policy tree segments the customers based on the type of internet service, the contract type, tech support, payment method, etc. and prescribes each group with the optimal pricing strategy to maximize expected tenure. For example, for those customers with DSL internet services, one/two year contract types, and that pay by bank transfer, the monthly charge of 70 gives the highest expected tenure.

### Evaluation

Now that we have a personalized pricing strategy, we want to evaluate how much better it is compared to the current price offered. In order to estimate rewards under the actual prices observed in the data, we need a function to map them to the closest of the prices for which we estimated rewards:

```
function round_charge(charge)
charge_rounded = floor(Int, (charge - 10) / 20) * 20 + 10
min(max(charge_rounded, 30), 110)
end
```

With this function and reward matrix, we can compare the outcome under the prescribed and observed prices and calculate the relative improvement with the following function:

```
using Statistics
function evaluate(lnr, X, treatment, rewards)
policy_pred = IAI.predict(lnr, X)
outcome_pred = map(1:length(policy_pred)) do i
rewards[i, policy_pred[i]]
end
outcome_actual = map(1:length(policy_pred)) do i
rewards[i, Symbol(round_charge(treatment[i]))]
end
mean(outcome_pred) / mean(outcome_actual) - 1
end
```

We can then calculate the estimated improvement in the training set:

```
evaluate(policy_grid, X_train, treatments_train, train_rewards)
round(evaluate(policy_grid, X_train, treatments_train, train_rewards)
```

`0.1887148425059`

We see that the pricing policy of the Optimal Policy Tree leads to an estimated 19% improvement in tenure.

To make this comparison in the test set, we retrain the reward estimator on the test set to have a fair out-of-sample evaluation:

```
test_predictions, test_reward_score = IAI.fit_predict!(
reward_lnr, X_test, treatments_test, y_test, times_test,
treatment_options,
outcome_score_criterion=:harrell_c_statistic)
test_rewards = test_predictions[:reward]
```

We then evaluate the estimated improvement in the test set:

```
evaluate(policy_grid, X_test, treatments_test, test_rewards)
round(evaluate(policy_grid, X_test, treatments_test, test_rewards)
```

`0.10453348665696`

Following the Optimal Policy Tree also leads to an estimated increase in tenure in the test set, giving us confidence this policy will generalize well to other customers.

## 2. Maximizing Customer Retention Rate Over a Period of Time

Alternatively, we can aim to maximize the survival probability at a given point in time. We will assume that we want to maximize the probability that a customer is still subscribed at 24 months, and define a Reward Estimation learner to estimate the rewards for this task. Note that the definition of the reward estimation learner is almost identical to what we had in the first task; the only exception is the addition of `evaluation_time`

to indicate the time at which we would like to calculate the survival probability.

```
reward_lnr = IAI.NumericSurvivalRewardEstimator(
propensity_estimator=IAI.RandomForestRegressor(),
outcome_estimator=IAI.RandomForestSurvivalLearner(),
reward_estimator=:doubly_robust,
random_seed=1,
evaluation_time=24,
)
train_predictions, train_reward_score = IAI.fit_predict!(
reward_lnr, X_train, treatments_train, y_train, times_train,
treatment_options,
outcome_score_criterion=:harrell_c_statistic)
train_rewards = train_predictions[:reward]
```

### Training Optimal Policy Tree

We then train an Optimal Policy Tree on these rewards:

```
policy_grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=seed,
),
max_depth=2:5,
)
IAI.fit!(policy_grid, X_train, train_rewards)
```

Similar to before, the trained policy tree segments the customers based on whether the customer has online backup services, the payment method, the internet service types, etc., and prescribes different pricing policy to maximize the 24-month retention rate. For example, for those customers that have online back up and a two-year contract, a monthly charge at 110 would result in the highest retention rate.

### Evaluation

We can now evaluate the effect of the new policy against current pricing in the training set:

```
evaluate(policy_grid, X_train, treatments_train, train_rewards)
round(evaluate(policy_grid, X_train, treatments_train, train_rewards)
```

`0.15615616109299`

We see that the pricing policy of the Optimal Policy Tree leads to an estimated 16% improvement in the customer retention rate at 24 months.

```
test_predictions, test_reward_score = IAI.fit_predict!(
reward_lnr, X_test, treatments_test, y_test, times_test,
treatment_options,
outcome_score_criterion=:harrell_c_statistic)
test_rewards = test_predictions[:reward]
evaluate(policy_grid, X_test, treatments_test, test_rewards)
round(evaluate(policy_grid, X_test, treatments_test, test_rewards)
```

`0.0741208358278`

Similarly, we observe an increased probability of customer retention in the test set, again giving us confidence this policy will generalize well to other customers.

## Extensions

In this case, we considered a simplified view of the problem where we aimed to maximize tenure or retention probability. In reality, we may care about optimizing for other outcomes, such as developing dynamic pricing strategies to maximize the Customer Lifetime Value (CLV).

Due to the complexity of this setup and the level of exposition required, we have not demonstrated it here, but the principle of using Reward Estimation and Optimal Policy Trees is the same as the examples shown above:

- we can use the full survival curves in our analysis to calculate the churn probability at each time interval under the current price, and multiply this with the price at each time to derive the expected CLV under each pricing strategy
- these outcomes can then be incorporated into the Reward Estimation process to estimate fair rewards under each pricing policy,
- we then apply Optimal Policy Trees to these rewards to learn optimal pricing strategies.