0/70 completed
Business Framework Interactive

Cohort Analysis

Group users by signup date and track behavior over time. Essential for understanding retention, LTV evolution, and comparing user groups.

๐Ÿ“Š What is Cohort Analysis?

Group users by when they joined (cohort), then track a metric over time. This reveals patterns hidden in aggregate data.

Why It Matters

  • โ€ข Aggregate retention can improve while cohort retention worsens (Simpson's paradox)
  • โ€ข Different cohorts have different behaviors
  • โ€ข See true product improvements over time
Cohort
M1
M2
M3
Jan
70%
55%
45%
Feb
65%
50%
-
Mar
75%
-
-

Retention Settings

Month 1 Retention (%) 70
40 90
Month 2 Retention (%) 55
30 80
Month 3 Retention (%) 45
20 70
Monthly Decay (%) 8
2 15

๐Ÿ“Š Cohort LTV

January Cohort $NaN
February Cohort $NaN
March Cohort $NaN

LTV = ฮฃ (retention ร— monthly revenue)

Retention Curves by Cohort

Each line = one cohort's retention journey. Compare cohorts to identify improvements.

Month 3 Retention Comparison

๐ŸŽฏ Use Cases

Retention Curves

Track how each signup cohort retains over time

A/B Test Impact

Compare cohorts with different treatments

Seasonality

Understand if summer vs winter signups behave differently

LTV by Acquisition

Which channels bring highest LTV cohorts?

๐Ÿ€ Sports Betting Applications

Acquisition Quality

  • โ†’ Compare LTV by acquisition channel
  • โ†’ Identify which campaigns bring quality users
  • โ†’ Track promo impact on future behavior

Product Improvements

  • โ†’ Did new feature improve retention?
  • โ†’ Compare pre/post cohorts fairly
  • โ†’ Seasonality analysis (NFL season vs off-season)

R Code Equivalent

# Cohort analysis
library(dplyr)
library(tidyr)

create_cohort_table <- function(users, activity) { 
  users %>%
    mutate(cohort = floor_date(signup_date, "month")) %>%
    left_join(activity, by = "user_id") %>%
    mutate(months_since = interval(cohort, activity_date) %/% months(1)) %>%
    group_by(cohort, months_since) %>%
    summarise(
      active = n_distinct(user_id),
      .groups = "drop"
    ) %>%
    left_join(
      users %>%
        mutate(cohort = floor_date(signup_date, "month")) %>%
        group_by(cohort) %>%
        summarise(cohort_size = n()),
      by = "cohort"
    ) %>%
    mutate(retention = active / cohort_size * 100)
}

# Calculate cohort LTV
cohort_ltv <- function(retention_curve, avg_monthly_rev) { 
  sum(retention_curve * avg_monthly_rev / 100)
}

# Example
retention <- c(100, 70, 55, 45)
ltv <- cohort_ltv(retention, 25)
cat(sprintf("Cohort LTV: $%.0f\n", ltv))

โœ… Key Takeaways

  • โ€ข Group users by signup period (cohort)
  • โ€ข Track metrics over time for each cohort
  • โ€ข Reveals patterns hidden in aggregate data
  • โ€ข Compare cohorts to measure improvement
  • โ€ข Essential for LTV and retention analysis
  • โ€ข Identify acquisition channel quality

Pricing Models & Frameworks Tutorial

Built for mastery ยท Interactive learning