Policy Learning to Reduce Churn

Reducing customer churn is a key topic of interest in digital marketing, as it can have a major impact on revenue and profitability. There are a variety of factors that lead to customers choosing to discontinue their subscription or usage of a product, and a one-size-fits-all approach such as a generic discount across the board can leave a lot on the table. To increase retention, it is important to learn and understand the different patterns in preferences and price sensitivity across various groups of customers.

A very common approach to churn reduction is building a predictive model that outputs the probability of churning in a given time period, and suggesting interventions for those with the highest probability of churning, such as direct marketing or offering lower prices. There is a critical flaw to this approach: when selecting customers for intervention, we did not consider the estimated impact of the intervention on their churn probability. To maximize the efficiency of our interventions, we should select those customers where the intervention has the largest impact on the churn probability. Therefore, it is critical to know the counterfactual outcomes and design an intervention policy that maximizes the hypothetical return under limited resources, rather than purely basing the strategy on the current churn probability.

Moreover, customer retention is not just a single-period problem. The customers are continually re-evaluating their decision to churn or not over time, and so this is really a dynamic problem that aims to optimize some long-term outcome. The classical approach of modeling churn as a single probability limits the ability to react to different customer behaviors and preferences over time.

In this case study, we look at a publicly-available churn dataset Telco Customer Churn and showcase how to use Reward Estimation, Optimal Policy Trees, and Survival Analysis to address these issues and provide practical guidance for data-driven churn reduction. For demonstration, we look at developing personalized pricing policies for two different goals:

  1. maximizing the expected customer lifetime
  2. maximizing the probability retaining a customer over a given time period

We will discuss in the end how to extend the analysis to more complex and realistic outcomes such as Customer Lifetime Value (CLV).

Data Preparation

We first read in the data and prepare the different data types. We model it as a survival problem, where the variable MonthlyCharges is considered the treatment, tenure represents the survival time, the variable Churn = Yes is used as censoring variable, and the remaining variables are the features X.

using CSV, DataFrames, CategoricalArrays
df = CSV.read("Telco-customer-churn.csv", DataFrame, pool=true)
replace!(df.MultipleLines, "No phone service" => "0",
                           "No" => "1",
                           "Yes" => "2+")
df[!, :NumPhoneLines] = CategoricalArray(df.MultipleLines, ordered=true,
                                         levels=["0", "1", "2+"])
select!(df, Not(:MultipleLines))

X = select(df, Not([:customerID, :tenure, :TotalCharges, :MonthlyCharges, :Churn]))
treatments = df.MonthlyCharges
y = df.Churn .== "Yes"
times = df.tenure .+ 1

We then split the data into 50/50 training and testing sets, to ensure we have enough data for testing to have high-quality reward estimation on the test set:

seed = 123
(X_train, treatments_train, y_train, times_train), (X_test, treatments_test, y_test, times_test) = IAI.split_data(
    :prescription_maximize, X, treatments, y, times, seed=seed, train_proportion=0.5
)

1. Maximizing Expected Customer Lifetime

First, we consider the task of designing a pricing policy to maximize customer lifetime. We will use a Reward Estimation Survival Learner to estimate the counterfactual outcomes under different prices. We consider monthly charges between 30 and 110 with an increment of 20 as the treatment options, as they largely cover the range of charges we see in the data and have enough resolution for us to differentiate the effect of different prices.

reward_lnr = IAI.NumericSurvivalRewardEstimator(
    propensity_estimator=IAI.RandomForestRegressor(),
    outcome_estimator=IAI.RandomForestSurvivalLearner(),
    reward_estimator=:doubly_robust,
    random_seed=1,
)
treatment_options = 30:20:110


train_predictions, train_reward_score = IAI.fit_predict!(
    reward_lnr, X_train, treatments_train, y_train, times_train,
    treatment_options,
    outcome_score_criterion=:harrell_c_statistic)
train_rewards = train_predictions[:reward]

Training Optimal Policy Tree

With the reward matrix, we can now train an Optimal Policy Tree that segments the customers into different cohorts, each with the optimal pricing strategy to maximize the counterfactual customer lifetime.

policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=seed,
    ),
    max_depth=2:5,
)
IAI.fit!(policy_grid, X_train, train_rewards)
IAI.get_learner(policy_grid)
Optimal Trees Visualization
NoYesElectronic checkBank transfer (automatic)Credit card (automatic)Mailed checkNoYesNoYesElectronic checkMailed checkBank transfer (automatic)Credit card (automatic)Electronic checkMailed checkBank transfer (automatic)Credit card (automatic)No internet serviceYesNoCredit card (automatic)Bank transfer (automatic)Electronic checkMailed checkYesNoNo internet serviceCredit card (automatic)Electronic checkMailed checkBank transfer (automatic)YesNoNo internet serviceOne yearTwo yearMonth-to-monthYesNoNo internet serviceOne yearTwo yearMonth-to-monthNo internet serviceYesNoDSLFiber opticNoPrescribe 110 n = 33941InternetServicePrescribe 50 n = 22522OnlineBackupPrescribe 70 n = 114223ContractPrescribe 110 n = 8533OnlineSecurityPrescribe 50 n = 139912ContractPrescribe 70 n = 58824TechSupportPrescribe 90 n = 55431PaymentMethodPrescribe 110 n = 6834TechSupportPrescribe 30 n = 17011Prescribe 50 n = 59113PaymentMethodPrescribe 30 n = 80818OnlineSecurityPrescribe 70 n = 43225Prescribe 70 n = 15626PaymentMethodPrescribe 70 n = 16632Prescribe 90 n = 38833Prescribe 110 n = 5585PaymentMethodPrescribe 50 n = 1258PartnerPrescribe 50 n = 49914Prescribe 30 n = 9215PartnerPrescribe 110 n = 16119PaymentMethodPrescribe 30 n = 64722Prescribe 70 n = 6127Prescribe 50 n = 9528PartnerPrescribe 50 n = 1486Prescribe 110 n = 4107Prescribe 90 n = 489Prescribe 50 n = 7710Prescribe 50 n = 3016Prescribe 30 n = 6217Prescribe 70 n = 10520Prescribe 110 n = 5621Prescribe 70 n = 3029Prescribe 50 n = 6530
×

We see that the trained policy tree segments the customers based on the type of internet service, the contract type, tech support, payment method, etc. and prescribes each group with the optimal pricing strategy to maximize expected tenure. For example, for those customers with DSL internet services, one/two year contract types, and that pay by bank transfer, the monthly charge of 70 gives the highest expected tenure.

Evaluation

Now that we have a personalized pricing strategy, we want to evaluate how much better it is compared to the current price offered. In order to estimate rewards under the actual prices observed in the data, we need a function to map them to the closest of the prices for which we estimated rewards:

function round_charge(charge)
  charge_rounded = floor(Int, (charge - 10) / 20) * 20 + 10
  min(max(charge_rounded, 30), 110)
end

With this function and reward matrix, we can compare the outcome under the prescribed and observed prices and calculate the relative improvement with the following function:

using Statistics
function evaluate(lnr, X, treatment, rewards)
  policy_pred = IAI.predict(lnr, X)
  outcome_pred = map(1:length(policy_pred)) do i
    rewards[i, policy_pred[i]]
  end

  outcome_actual = map(1:length(policy_pred)) do i
    rewards[i, Symbol(round_charge(treatment[i]))]
  end

  mean(outcome_pred) / mean(outcome_actual) - 1
end

We can then calculate the estimated improvement in the training set:

evaluate(policy_grid, X_train, treatments_train, train_rewards)
round(evaluate(policy_grid, X_train, treatments_train, train_rewards)
0.1887148425059

We see that the pricing policy of the Optimal Policy Tree leads to an estimated 19% improvement in tenure.

To make this comparison in the test set, we retrain the reward estimator on the test set to have a fair out-of-sample evaluation:

test_predictions, test_reward_score = IAI.fit_predict!(
    reward_lnr, X_test, treatments_test, y_test, times_test,
    treatment_options,
    outcome_score_criterion=:harrell_c_statistic)
test_rewards = test_predictions[:reward]

We then evaluate the estimated improvement in the test set:

evaluate(policy_grid, X_test, treatments_test, test_rewards)
round(evaluate(policy_grid, X_test, treatments_test, test_rewards)
0.10453348665696

Following the Optimal Policy Tree also leads to an estimated increase in tenure in the test set, giving us confidence this policy will generalize well to other customers.

2. Maximizing Customer Retention Rate Over a Period of Time

Alternatively, we can aim to maximize the survival probability at a given point in time. We will assume that we want to maximize the probability that a customer is still subscribed at 24 months, and define a Reward Estimation learner to estimate the rewards for this task. Note that the definition of the reward estimation learner is almost identical to what we had in the first task; the only exception is the addition of evaluation_time to indicate the time at which we would like to calculate the survival probability.

reward_lnr = IAI.NumericSurvivalRewardEstimator(
    propensity_estimator=IAI.RandomForestRegressor(),
    outcome_estimator=IAI.RandomForestSurvivalLearner(),
    reward_estimator=:doubly_robust,
    random_seed=1,
    evaluation_time=24,
)

train_predictions, train_reward_score = IAI.fit_predict!(
    reward_lnr, X_train, treatments_train, y_train, times_train,
    treatment_options,
    outcome_score_criterion=:harrell_c_statistic)
train_rewards = train_predictions[:reward]

Training Optimal Policy Tree

We then train an Optimal Policy Tree on these rewards:

policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=seed,
    ),
    max_depth=2:5,
)
IAI.fit!(policy_grid, X_train, train_rewards)
Optimal Trees Visualization
Electronic checkBank transfer (automatic)Credit card (automatic)Mailed checkFiber opticNoDSLYesNoNo internet serviceOne yearTwo yearMonth-to-monthNoYesFiber opticNoDSLYesNo internet serviceBank transfer (automatic)Credit card (automatic)Mailed checkElectronic checkFemaleMaleOne yearTwo yearMonth-to-monthNoNo internet serviceYesPrescribe 110 n = 33941OnlineBackupPrescribe 50 n = 19172ContractPrescribe 110 n = 147713genderPrescribe 50 n = 7783PaymentMethodPrescribe 90 n = 113910OnlineBackupPrescribe 110 n = 72214InternetServicePrescribe 110 n = 75523Prescribe 50 n = 2944Prescribe 50 n = 4845PaperlessBillingPrescribe 30 n = 49911Prescribe 110 n = 64012Prescribe 70 n = 31415ContractPrescribe 110 n = 40818TechSupportPrescribe 110 n = 2596InternetServicePrescribe 50 n = 2259Prescribe 70 n = 20416Prescribe 90 n = 11017Prescribe 110 n = 32919Prescribe 50 n = 7920PaymentMethodPrescribe 70 n = 737Prescribe 110 n = 1868Prescribe 50 n = 4521Prescribe 90 n = 3422
×

Similar to before, the trained policy tree segments the customers based on whether the customer has online backup services, the payment method, the internet service types, etc., and prescribes different pricing policy to maximize the 24-month retention rate. For example, for those customers that have online back up and a two-year contract, a monthly charge at 110 would result in the highest retention rate.

Evaluation

We can now evaluate the effect of the new policy against current pricing in the training set:

evaluate(policy_grid, X_train, treatments_train, train_rewards)
round(evaluate(policy_grid, X_train, treatments_train, train_rewards)
0.15615616109299

We see that the pricing policy of the Optimal Policy Tree leads to an estimated 16% improvement in the customer retention rate at 24 months.

test_predictions, test_reward_score = IAI.fit_predict!(
    reward_lnr, X_test, treatments_test, y_test, times_test,
    treatment_options,
    outcome_score_criterion=:harrell_c_statistic)
test_rewards = test_predictions[:reward]

evaluate(policy_grid, X_test, treatments_test, test_rewards)
round(evaluate(policy_grid, X_test, treatments_test, test_rewards)
0.0741208358278

Similarly, we observe an increased probability of customer retention in the test set, again giving us confidence this policy will generalize well to other customers.

Extensions

In this case, we considered a simplified view of the problem where we aimed to maximize tenure or retention probability. In reality, we may care about optimizing for other outcomes, such as developing dynamic pricing strategies to maximize the Customer Lifetime Value (CLV).

Due to the complexity of this setup and the level of exposition required, we have not demonstrated it here, but the principle of using Reward Estimation and Optimal Policy Trees is the same as the examples shown above:

  • we can use the full survival curves in our analysis to calculate the churn probability at each time interval under the current price, and multiply this with the price at each time to derive the expected CLV under each pricing strategy
  • these outcomes can then be incorporated into the Reward Estimation process to estimate fair rewards under each pricing policy,
  • we then apply Optimal Policy Trees to these rewards to learn optimal pricing strategies.