Back to blog
/6 min read

Project 1 Q2: Survival Analysis for Telco Customer Churn

A survival-analysis report for IBM Telco Customer Churn data, including Kaplan-Meier, log-rank tests, Cox PH, AFT, and CLV.

survival analysissparkmysqlcustomer churnproject 1

This post summarizes the survival-analysis part of Project 1. The goal is to analyze how long month-to-month internet customers remain active before churn, and to connect survival probabilities with customer lifetime value.

Dataset and Survival Setup

The analysis uses the IBM Telco Customer Churn dataset. Each row represents one customer. I used tenure as the duration variable and Churn as the event indicator.

ConceptVariableMeaning
DurationtenureCustomer lifetime in months
EventChurn = YesChurn happened
CensoringChurn = NoCustomer has not churned yet
PopulationContract = Month-to-month and InternetService != NoSelected survival-analysis cohort

After filtering, the survival-analysis cohort contains 3351 customers. Among them, 1556 customers churned, giving a churn rate of 46.43%. The average tenure is 19.43 months.

Kaplan-Meier Survival Curve

The Kaplan-Meier estimator measures the probability that a customer survives beyond each month. It handles censored customers correctly, so active customers are not treated as churned.

MonthSurvival probability
60.7803
120.6950
240.5753
360.4800
480.3872
600.2890

The median survival time is 34 months. This means that the estimated probability of a selected customer remaining active reaches about 50% around month 34.

Log-Rank Tests

I used log-rank tests to compare survival curves across customer attributes.

FeatureMinimum p-valueInterpretation
gender0.153317Not significant
seniorCitizen0.723174Not significant
partner2.25e-31Significant
dependents3.24e-09Significant
onlineSecurity1.19e-32Significant
onlineBackup4.12e-43Significant
techSupport1.92e-21Significant
paymentMethod1.30e-21Significant

The strongest differences appear in service and support variables such as online backup, online security, and tech support. These variables are more informative than gender or senior-citizen status for this survival task.

Cox Proportional Hazards Model

The Cox model estimates how each covariate changes churn hazard. Selected model results are:

CovariateCoefficientHazard ratiop-value
dependents_Yes-0.32870.7199< 0.001
internetService_DSL-0.21730.80470.0002
onlineBackup_Yes-0.77660.4600< 0.001
techSupport_Yes-0.63920.5277< 0.001

The Cox concordance index is 0.6409. Online backup and tech support are associated with lower churn hazard. The proportional-hazards assumption check shows violations for several variables, so the Cox model should be interpreted carefully.

Accelerated Failure Time Model

I also fitted a log-logistic Accelerated Failure Time model. The AFT model achieved:

MetricValue
Concordance index0.7306
AIC13698.72
Log-likelihood-6838.36

Selected AFT coefficients:

CovariateCoefficientexp(coef)p-value
onlineBackup_Yes0.81282.2542< 0.001
onlineSecurity_Yes0.86162.3669< 0.001
techSupport_Yes0.68931.9923< 0.001
partner_Yes0.67681.9675< 0.001
internetService_DSL0.38371.4678< 0.001

Positive AFT coefficients indicate longer expected time before churn. The AFT model had a higher concordance index than the Cox model in this analysis.

Customer Lifetime Value

Finally, I used survival probabilities to estimate customer lifetime value over a 72-month horizon.

Profile72-month CLV
Baseline profile895.03
Protected profile2051.50

This step connects survival modeling with business value. The protected profile, which has stronger retention-related service features, has a much higher estimated CLV.

Conclusion

Survival analysis is useful for churn because it models not only whether customers churn, but when they are likely to churn. In this Telco dataset, support and service variables such as online backup, online security, and tech support are strongly related to longer customer survival. Kaplan-Meier curves provide interpretable survival probabilities, log-rank tests identify important group differences, Cox and AFT models quantify feature effects, and CLV converts survival probabilities into a business metric.