Business Framework Interactive
Cohort Analysis
Group users by signup date and track behavior over time. Essential for understanding retention, LTV evolution, and comparing user groups.
๐ What is Cohort Analysis?
Group users by when they joined (cohort), then track a metric over time. This reveals patterns hidden in aggregate data.
Why It Matters
- โข Aggregate retention can improve while cohort retention worsens (Simpson's paradox)
- โข Different cohorts have different behaviors
- โข See true product improvements over time
Cohort
M1
M2
M3
Jan
70%
55%
45%
Feb
65%
50%
-
Mar
75%
-
-
Retention Settings
40 90
30 80
20 70
2 15
๐ Cohort LTV
January Cohort $NaN
February Cohort $NaN
March Cohort $NaN
LTV = ฮฃ (retention ร monthly revenue)
Retention Curves by Cohort
Each line = one cohort's retention journey. Compare cohorts to identify improvements.
Month 3 Retention Comparison
๐ฏ Use Cases
Retention Curves
Track how each signup cohort retains over time
A/B Test Impact
Compare cohorts with different treatments
Seasonality
Understand if summer vs winter signups behave differently
LTV by Acquisition
Which channels bring highest LTV cohorts?
๐ Sports Betting Applications
Acquisition Quality
- โ Compare LTV by acquisition channel
- โ Identify which campaigns bring quality users
- โ Track promo impact on future behavior
Product Improvements
- โ Did new feature improve retention?
- โ Compare pre/post cohorts fairly
- โ Seasonality analysis (NFL season vs off-season)
R Code Equivalent
# Cohort analysis
library(dplyr)
library(tidyr)
create_cohort_table <- function(users, activity) {
users %>%
mutate(cohort = floor_date(signup_date, "month")) %>%
left_join(activity, by = "user_id") %>%
mutate(months_since = interval(cohort, activity_date) %/% months(1)) %>%
group_by(cohort, months_since) %>%
summarise(
active = n_distinct(user_id),
.groups = "drop"
) %>%
left_join(
users %>%
mutate(cohort = floor_date(signup_date, "month")) %>%
group_by(cohort) %>%
summarise(cohort_size = n()),
by = "cohort"
) %>%
mutate(retention = active / cohort_size * 100)
}
# Calculate cohort LTV
cohort_ltv <- function(retention_curve, avg_monthly_rev) {
sum(retention_curve * avg_monthly_rev / 100)
}
# Example
retention <- c(100, 70, 55, 45)
ltv <- cohort_ltv(retention, 25)
cat(sprintf("Cohort LTV: $%.0f\n", ltv))โ Key Takeaways
- โข Group users by signup period (cohort)
- โข Track metrics over time for each cohort
- โข Reveals patterns hidden in aggregate data
- โข Compare cohorts to measure improvement
- โข Essential for LTV and retention analysis
- โข Identify acquisition channel quality