Language Model-Enhanced Feature Engineering Framework for Customer Churn Analysis

Maryam SHAHABIKARGAR, Amin BEHESHTI, Saleh AFZOON, Jin FOO, Xuyun ZHANG and Nasrin SHABANI

School of Computing, Macquarie University, Sydney, Australia

https://doi.org/10.5171/2025.4526625

Abstract

Customer churn remains a major concern across industries, especially with the rising importance of customer retention over acquisition. While prior research has focused heavily on structured data, the potential of customer-generated textual data, such as chat logs and feedback, remains underutilized. This study addresses this gap by introducing a novel feature engineering framework that leverages Language Models (LMs), to extract meaningful insights from unstructured text data for churn prediction. We propose a multi-stage pipeline that combines domain expertise with LM capabilities to generate interaction features, sentiment labels, emotional tone scores, and topic- based features from customer chat data. Additionally, we introduce a new composite metric, the Normalized-weighted Churn Score, which integrates expert-assigned topic weights with language model outputs. The framework was evaluated using a churn dataset containing structured and unstructured data. Results show that incorporating LM-enhanced features significantly boosts model performance across multiple classifiers. Notably, models using our enriched feature outperformed traditional baselines, achieving an F1-score increase of over 26%. The findings emphasize the critical role of text analytics and hybrid feature engineering in advancing churn pre- diction and offer a scalable approach for integrating domain knowledge with modern NLP techniques.

Keywords: Customer Churn Analysis, Feature Engineering, Language Models, Textual Data Processing
Shares