Churn Prediction in Telecommunication Industry: Model Interpretability

Abstract

The large number of studies published in the last ten years on the problem of customers migrating from one telecommunications service provider to another competing provider proves that this problem has become a major concern for this industry and beyond. The purpose of this paper is to detect which variables from the multitude presented in the data set for postpaid clients, represents an important driver in the problem of migrating customers to another Romanian mobile telecommunications company. To enable us to understand and solve the problem of churn in telecommunications, we need tools that can interpret the results. Thus, we use a Balanced Random Forest for the churn model and three feature selection tools: Permutation Importance, Partial Dependence Plot and SHAP. Applying them to the churn model, we classify the predictive indicators according to their importance, their predictive power and the distribution of the impact that each characteristic has in the model. According to the Permutation Importance, the drivers regarding churn issue are: the number of months since the last offer was changed from the account, the number of minutes consumed outside the company, the value of the invoice, the age of the customer and his time at this telecommunications operator. Partially Dependence Plot determinates the churn risk areas faced by the Romanian telecommunications company for each of the indicators listed, such as: clients with younger ages or with outdated offers (unchanged for almost two years). SHAP also shows that many months since the last offer, a significant percentage of minutes received from competing networks or a small age in the network, increases the estimated churn per customer.

Keywords: churn, feature importance, model agnostic features, churn risk area

Introduction

According to the portability balance provided by ANCOM (May 2019), the churn phenomenon has grown to a large extent in the Romanian telecommunications sector. The National Authority for Administration and Regulation in Communications (ANCOM) states that over a decade since the launch of the portability service, over five million phone numbers have been ported, over four million of them being mobile numbers and over seven hundred thousand of them being landline numbers. The portability service allows consumers to keep their phone number when changing service providers, thus increasing the freedom to migrate from one provider to another and giving them the opportunity to enjoy the benefits of a competitive market. Since October 21, 2008, when this service was launched on the market, and up to mid-October 2018, a total of over 5.1 million numbers were ported, of which 4.4 million are mobile phone numbers. The telecommunications sector is one of the sectors that supports in influencing the Romanian economy: together with the IT industry, in 2017, it reached 6.2% of Romania’s GDP.

The number of studies published in recent years regarding the interpretability of the prediction model of the churn phenomenon in telecommunications proves that this problem has become a major concern. So far, the Romanian data sets on this topic have not been the subject of published studies.

Given the real maturity of the telecommunications market, it has become increasingly profitable for companies in this area, to invest significantly in their customer relationship than to invest in acquiring new customers. In order to successfully retain customers, it is essential to identify which are the main engines related to the potential risk of churn, i.e. the proportion of subscribers leaving the current provider in a certain period of time, identifying the main indicators that determine a migratory behavior, the decision makers regarding the action of churn and what measures can be taken to avoid the situation where customers leave the company for similar services provided by the competition.

The purpose of this paper is to identify the causal factors in the churn action from the telecommunications industry. We want to achieve this through graphical methods that are easy to view and interpret. We will use a prediction technique called Balanced Random Forest and three methods of agnostic explanation. The first one, Permutation Importance, classifies variables according to their predictive power, thus, it generates the most important characteristics. The second mechanism applied – Partial Dependence Plot, can be seen as a directional tool that explains the evolution and meaning of the features in the predictive model. It has the property of an antenna, being more sensitive in one direction than in another. It indicates the directionality of the characteristic values in the churn model at the level of each indicator. The last method, SHAP, is part of the group of explanatory techniques. This is a technique derived from game theory, which is based on Shapley values. It shows how big the contribution for each player is in the predictive model. We apply this concept to identify the contribution of each feature in the problem of churn phenomenon.

In this study, we propose to find out about the technologies that allow analytical work. We will go through a landscape of modern tool classes and learn how these tools support common analytical tasks in combating the churn phenomenon by determining drivers – the most important indicators in this process. We will study three types of instruments of the same typology that determinate the most important drivers in the prediction model of the churn. Then, we will focus on the behavior of churn customers in telecommunications.

In this paper, we want to provide a solid basis in how the interpretation of the results of the prediction model really works, the focus being on the predictive indicators. Regarding the forecasting activity, from a subjective point of view, the most important thing is a correct and affordable interpretation, and for this, we need tools that rise to the level of expectations. We could compare this process, of analysis and interpretation, with one sport: surfing. To practice it, you must know what wave is to come and interpret it correctly. You need to know how it might behave and what rules to follow to navigate with others. You also must have the right tools for the right wave. So, we want to test whether these three interpreting tools are suitable for the wave called churn prediction in the telecommunications sector. Data analysis and interpretation is a process that creates clarity in chaos. To enable us to understand and solve real-world problems using advanced data and predictive methods, we need the most capable tools. These are among the most important elements that support the analytical environment.

Literature Review

The telecommunications industry is facing fierce competition to keep customers and, therefore, requires an efficient churn prediction model. In the specialty literature, Adnam Idris (2012) studies the churn prediction in the field of telecommunications using Random Forest and the nearest neighbor’s method (KNN). In his work, these two techniques are applied on a high dimensional data set, in which the minority class has fewer instances compared to the majority class, so poor learning leads to unsatisfactory results. Random Forest needs a level of training to produce desirable performance. This paper addresses the imbalance between classes by applying the Balanced Random Forest (BRF) because this technique underestimates the majority class so that it is a cardinal fit with the minority one (Robert O’Brian, Hemant Ishwaran, 2019).

A study that seeks to correct the imbalance is that of Chao Chen (2004). It demonstrates that imbalanced data is corrected by Random Forest techniques, such as Weighted Random Forest (WRF) and Balanced Random Forest (BRF). WRF gives higher weight to the minority class, sharply sanctioning the wrong classification of the minority class. BRF combines the technique of down sampling majority class and the idea of learning together, artificially altering the distribution of the class so that the classes are represented equally by each tree. The article further demonstrates that BRF and WRF perform better than the SHRINK, 1-NN and C4.5 techniques. However, there is no clear winner between BRF and WRF. By building BRF and WRF, the first one is computationally more efficient on a large and unbalanced dataset because each tree uses only a small portion of the training set, while WRF must use the entire training set. Also, because WRF assigns a weight to the minority class, it is possible to label to classes more wrongly than the BRF. A majority case that is wrongly labeled as belonging to the minority class could have a greater effect on the accuracy of the majority class prediction in the WRF than in the BRF.

Understanding why a model makes a certain prediction can be just as crucial as the accuracy of forecasting in many applications. Complex nonparametric models – such as neural networks, Random Forest and the car support vector – are more common than ever in predictive applications, especially when dealing with large databases that do not meet the strict assumptions imposed by traditional statistical techniques. Unfortunately, understanding the results of such models can be difficult for management. The partial dependency graph offers a simple solution. Partial dependency graphs are rendering the graphical dimension of the prediction function, so that the relationship between the outcome and the predictors of interest can be easier to understand. These graphs are especially useful in explaining black box outputs (Brandon M. Greenwell, 2017). The partial dependency graph (short graph or PDP) shows the marginal effect that one or two characteristics have on the expected outcome of a machine learning model (J. H. Friedman, 2001). A partial dependency diagram can show whether the relationship between a target and a feature is linear, monotone or more complex.

In this paper, in order to detect the indicators that are most important in the churn model, we propose the use of three tools: Permutation Importance (PI), Partial Plot Dependency (PDP) and SHAP. In the specialty literature, the first technique is used by Andre Altmann et al. (2010) to select important variables from a specific data set in the medical field (HIV detection). The technique involves the normalization of the bias measure by a permutation test and returns p-values for each characteristic. As for technique SHAP (SHPley Additive exPlanations), it helps break down a prediction to show the impact of each feature. It is based on Shapley values, a technique used in collaborative games from game theory in order to determine how much each player has contributed to the success of the game. Normally, the balancing between the precision of the results and the interpretation is a difficult act, but SHAP values can provide both. SHAP assigns each feature an important value for a prediction. Its new elements include: identifying a new class of measures of importance of the additive characteristics and the theoretical results that show the existence in this class of a unique solution with a set of desirable properties (Scott M. Lundberg, 2017).

The importance based on the permutation (PI) represents a good balance of calculation and performance for any model, Fisher A. et al. (2018). Its applicability is precisely the reason for the recommendation of academicians, Baptiste G. et al. (2016). Permutation Importance shows the performance of the model with and without variables, altering in turn all indicators, Gregorutti B. et al. (2017). PI are current methods that play with model agnostic features, Casalicchio G., Molnar C. and Bischl B. (2019).

Methodology

We apply Balanced Random Forest on a sample of 10,701 customers of a large telecommunications operator in Romania. The target indicator, found in the specialty literature under the name of variable Churn, is populated with the value 1 if the client has made churn, and 0 otherwise. The predictive model is applied on a snapshot of March 2018. The balance of the active or migrated clients was realized at the time of the cohort (after 3 months).

In the collected data set, we group the variables according to the information they bring in 4 categories: demographic data: such as age, gender of the client; data on the life cycle in the company: the client’s age, the number of months since the last offer change in the account or the contract change; information about the financial power of each client: the value of the invoice, the additional cost paid etc.; and data regarding the interaction of the subscribers with the customers of the competing networks: the number of minutes that it uses to call in another network and the number of minutes received from another network.

The first step in our analysis and research process is to apply the Balanced Random Forest prediction technique to the dataset. The database will be divided as follows: 75% train and 25% validation. On the train set, we will apply the three tools to detect the indicators that are most important in the churn model – Permutation Importance (PI), which are the influencing factors in the model and to what extent they affect the behavior of churn or non-churn – Partial Plot Dependency (PDP). With this technique, we also want to detect which is the safety zone, in our case, high probability of remaining non-churn and which is the risk zone, the probability of becoming churn depending on the values of the analyzed indicators. Thus, we can set a risk threshold for each factor in the model. We will generate PDP type charts only for the most important features of the churn model – top 5 important indicators determined by PI. Then, based on the trained dataset, we will apply SHAP to solve the same problem: determining the most important causal factors in the churn problem. Besides finding out, we also want to see the impact of the indicators in the prediction model of churn behavior as there may be indicators that do not add value in the model or that even harm the prediction.

All three tools generate an output with an easy interpretation for business people in marketing or management and an easy identification of significant variables in managerial decision making.

Results

We measure the accuracy of the churn model generated by the Balanced Random Forest using the ROC curve and the AUC coefficient (Fig. 1). The main purpose of the paper is given by the performance, applicability and interpretation of the three tools in the feature selection category. So, a not very high value of the coefficient that measures the performance of the area under the curve does not affect us.

Fig. 1: ROC Curve

Source: Authors’ own research

From all the 16 indicators included in the model, Permutation Importance (PI) selects only 5 that are the most important in the churn problem: MonthsO, MinC, Invoice, Age and Tenure. The rest can be considered important or even harmful to the prediction – those indicators that are centered on the left side of the axis (Fig. 2).