Predictive analytics: The case of customer retention

9 min readMay 22, 2019

How to predict customer retention rates in the automotive industry

As markets mature, the ability to retain customers becomes vital to sustaining a profitable business. Thus, a key challenge for any company is to implement a customer retention strategy. The strategy must include a quantifiable goal for customer retention and a deliberate plan for how to reach the retention goal. In this article we present a way to improve customer retention in the automotive industry by applying predictive analytics.

The challenge: Predicting customers’ probability to return for business
To successfully retain customers, it is often required to identify the customers who are less likely to return. That is, to point out the customers who require some kind of marketing action to be retained. To the data scientist, this is an analytical challenge of estimating the retention probability of each customer in the customer base. To do so, the data scientist needs to: (1) Define a retention KPI, (2) select a suitable set of predictors, and (3) apply an appropriate statistical model.

Retention KPI in the automotive industry
The applied retention KPI varies across industries. However, a common retention KPI is the retention rate. That is, the number of customers who have returned for business divided by the total number of customers. If all (no) customers return for business, the retention rate is 100% (0%). Often the retention rate also includes a time dimension. For example, whether a customer returned within 12 months. For simplicity, we apply a retention KPI without a time dimension. We will explain our retention KPI through the Automotive Eco System illustrated below.

The Automotive Eco System
To define a retention KPI for the automotive industry we initially map the typical customer journey of an automotive customer. We name the automotive customer journey the Automotive Eco System. When a customer purchases a new car, the customer is a sales customer from the automotive dealer’s point of view. Thereafter, the customer converts into an aftersales customer who visit a workshop for maintenance. Finally, at some point the aftersales customer re-purchases a car and once again becomes a sales customer. That is, the eco cycle is repeated.

Our first retention KPI is the Sale Conversion Rate. This metric is the number of customers who purchased a car and returned for a service visit at the same dealer divided by the total number of customers who purchased a car at the dealer. Intuitively, the Sale Conversion Rate measures the share of customers who are converted from a sale customer to an aftersales customer.

The second retention KPI is the Service Retention Rate. This rate is the number of customers who re-visit a workshop for maintenance or an unexpected repair.

The third and final retention KPI is the Service Conversion Rate. This KPI is the number of aftersales customers who re-purchase a car at the dealer divided by the total number of aftersales customers. Intuitively, the Service Conversion Rate measures the share of customers who are converted from an aftersales customer to a sale customer.

Now that we have our retention KPIs in place, we will proceed to selecting suitable predictors for whether a customer returns for business.

An illustration of the Automotive Eco System and the three customer retention KPIs in the automotive industry

A predictor requires co-movement and prior knowledge
In short, a predictor is a variable which can be used to predict the outcome of another variable. Generally, a stronger co-movement between the predictor and the outcome variable provides better predictions. Moreover, for a predictor to be useful, it must be known prior to the outcome variable. Let’s illustrate with a simple example: Dark clouds commonly appear before and during rainy weather. Thus, whenever we observe dark clouds in the sky, we predict that rainfall is likely to occur. This allows us to take precautions such as bringing along an umbrella. In the words of statistics, a dark cloud is a great predictor of rainy weather. Similarly, a single raindrop is also highly correlated to rainy weather. However, the timing between the single raindrop and a profound rain shower can be very short. Thus, even if a single raindrop is a good predictor, it is not a very useful one, as it does not allow us to take any precautions.

A predictor may not be causing the outcome
Within the discipline of predictive analytics, the objective is to predict an outcome — not to fully understand the causal mechanism leading to an outcome. Intuitively, our mission is to predict an outcome — not to understand why it occurred. Let’s refer to an actual statistical example to explain this concept. In 2009 a group of data scientists at Google discovered that the aggregated search volume for words related to flu (i.e., headache, sore throat) was highly correlated to the actual record of flu patients. Since the search volume data is available in real time, the data scientists were able to predict a flu epidemic much quicker than traditional methods. Obviously, searching for “headache” does not cause one to get the flu. Thus, the search term is not the causation of the flu. Nevertheless, because people who have the flu are more likely to search for these keywords, the aggregated search volume of the flu keywords are great predictors. Thus, they are useful for prediction analysis even if they tell us nothing about what is causing the flu.

See https://www.npr.org/sections/health-shots/2014/03/13/289802934/googles-flu-tracker-suffers-from-sniffles?t=1558448275881 for a full review

Customer experience is a superior predictor of customer retention
Now that we have the requirements for a predictor in place, it is time to shred light on potential predictors. The commonly applied predictors can be organized into these five categories:

· Customer demographics (i.e., age, gender)
· Customer transactions (i.e., number of purchases, latest purchase)
· Product-related (i.e., type of product purchased, age of the product)
· Customer interactions (i.e., number of calls to help desk)
· Customer experience reviews (i.e., a survey after a transaction)

Generally, the best predictors are context dependent. That is, customer demographics may be excellent predictors for one industry and powerless predictors in another industry. Therefore, the best practice is to test variables within each category to identify the best predictors for your specific context.

That being said, customer experience reviews very often turn out to be a superior predictor. Intuitively, this is not surprising at all. After all, a customer reveals their satisfaction during a customer experience review. If the customer is very satisfied, it is not super surprising that data reveals a much larger probability of the customer returning for business. Before we go into more details on the statistical connection between customer experience reviews and customer retention, we introduce the applied statistical method.

Probit regression: An appropriate choice for predictive analytics
Recall, that our objective is to estimate the probability of a customer returning for business. More specifically, we want to estimate the retention probability conditional on customer characteristics; i.e., the probability of a car owner returning to a workshop conditional on the mileage of the car. Most often, data scientists apply a combination of deterministic and probabilistic models to estimate conditional probabilities. Without going into technical details, a probabilistic model is based on the theory of probability. That is, the fact that randomness plays a role in predicting future events; e.g., the probability to win lotto. A deterministic model is based on the theory that the probability of a future event can be calculated exactly. If something is deterministic, you have all of the data necessary to predict (determine) the outcome with 100% certainty. A very famous example of a deterministic model is Newton’s universal law of gravitation.

Due to the level of complexity, inconsistency and variance of human behavior, we cannot simply apply a deterministic model. We do not have the necessary data to determine human behavior. Therefore, we implement an element of randomness to account for the variation in human behavior which we are not able to explain with our available data. One such statistical model is the Probit Regression Model. We will not go into technical details about the Probit Regression Model in this article. However, we suggest that interested readers visit this site for a full review and a practical application in R of the Probit Regression Model. We now have everything required to conduct our customer retention analysis. To limit the length of this article, we jump straight to the results of the predictive analysis.

The results: Customer experience is an excellent predictor for retention KPIs
We apply data from more than 150,000 automotive customers who purchased a car and/or visited a workshop for maintenance in the period of January 2011 to February 2019. We incorporate data sources from each of the five categories mentioned above. In this article, we only bring forward the predictive power of the customer experience data. Specifically, we illustrate the predictive power of a customer’s rating of a prior customer experience and the customer’s likelihood of returning to the dealer (workshop). Obviously, the retention probability also depends on other factors than the customer experience rating. I.e., a customer with a very old car is more likely to purchase a new car than a customer with a very new car. We incorporate these factors by showing the result for an “average customer”, that is, a customer with the average values of the included variables - i.e., the average mileage of car and age of the car owner.

The plot below contains three figures. The left figure shows the Sale Conversion Rate as a function of the rated sale experience. That is, the probability for a sale customer to return to the dealer for service conditional on the customer’s rating of the sale experience. The Customer Satisfaction score ranges from 0–100; 100 being the highest satisfaction score and 0 being the lowest score. As one can see from the figure, the Probit Regression Model estimates a (statistically significant) positive relation. That is, the higher rating of the sale experience, the higher probability for the customer to return for service maintenance at the same dealer.

The middle figure shows the Service Retention Rate conditional on the aftersales experience. Once again, the Probit Regression Model estimates that a higher satisfaction score of the prior aftersales experience increases the likelihood for the customer to return to the workshop.

Finally, the right figure shows the Service Conversion Rate conditional on the Customer Satisfaction score of the latest service visit. The Probit Regression Model estimates that a higher satisfaction score of the prior aftersales experience increases the likelihood for the customer to return to the dealer to buy a new car. Hence, customer experience data can help us predict all of our three mentioned retention KPIs in the automotive industry. We now proceed to the final section — the implication of our results.

The implications: Segment your retention actions based on customer experience reviews
The biggest implication of our findings is the fact that your company’s customer experience program should be used for more than a simple metric of your customers’ loyalty. Because customer experience data is such a powerful predictor of customer retention, you may not even need to set up a sophisticated statistical model to identify high and low retention probability customers. You can simply segment your customer base by their customer experience ratings and set up specific retention actions for each of the segments. Below we highlight a couple of suggested actions for customers with respectively low and high customer experience ratings:

· Customers with high experience ratings: Dedicated marketing offerings (i.e., discount on a product they frequently purchase) and invitation to loyalty programs.
· Customers with low experience ratings: A structured alert warning system to ensure that every customer with a low experience rating is contacted personally to ease their pains. Especially important during the onboarding of a new customer.

If you do have a customer experience program, you can still accomplish reliable retention probabilities by applying your in-house customer data and a probability model such as the Probit Regression Model.

Predictive analytics: The case of customer retention

Written by ag analytics