Yusuf Musa
Data Science

Headline A/B Test Analysis

30K+A/B Experiments
2013-15Time Period
6Research Questions
CTRPrimary Metric
By Yusuf Musa

Overview

This project analyzes 30,000+ A/B headline experiments from the Upworthy Research Archive (2013–2015) to determine what drives user engagement and validate experimental reliability. Each experiment tested multiple headline variations — sometimes including image changes — and measured impressions, clicks, and click-through rate (CTR).

The goal: identify headline features that genuinely boost engagement, and distinguish breakthrough insights from statistical anomalies.

Analysis Pipeline

1

Ingest

30K+ experiments from archive

2

Engineer

Sentiment, length, question detection

3

Analyze

Statistical tests & comparisons

4

Validate

Replication across experiment types

Research Questions

Six hypotheses were tested against the dataset to isolate which headline characteristics drive measurable engagement differences:

H1

Do headlines with numbers outperform those without?

Numeric indicators

H2

Do question-formatted headlines increase CTR?

Question detection

H3

Is there an optimal headline length for engagement?

Character & word count

H4

Do emotional headlines outperform neutral ones?

Sentiment scoring

H5

Does text alone drive performance when images are identical?

Image control groups

H6

Do exploratory findings replicate in confirmatory experiments?

Replication testing

Feature Engineering

Raw headline text was transformed into structured analytical features using NLP techniques. Each headline was scored and categorized across multiple dimensions to enable systematic comparison.

Feature Extraction Pipeline

Raw Input

“10 Things That Will Change How You See the World”

Has NumberTrue
Is QuestionFalse
Word Count10
Char Length49
Sentiment+0.12
PolarityPositive

Statistical Analysis

Each hypothesis was tested using statistical methods from SciPy. The analysis compared CTR distributions between headline groups, controlling for sample size and multiple testing. Results were validated through replication across exploratory and confirmatory experiment subsets.

Validation Framework

Phase A

Exploratory

Initial hypothesis testing on the full dataset to identify patterns

GenerateTestSignal?

Phase B

Confirmatory

Replication testing on held-out experiments to validate findings

ReplicateConfirmRobust?

Only findings that survive both phases are reported as reliable

Headline Feature Comparison

The analysis compared CTR performance across headline characteristics. Each feature was isolated to measure its independent effect on engagement.

CTR by Headline Feature

Contains Number

Yes
No

Question Format

Question
Declarative

Emotional Tone

Emotional
Neutral

Headline Length

Optimal (8-12 words)
Short / Long

Bar widths represent relative CTR performance (illustrative)

Key Objectives & Outcomes

Maximize Engagement

Identify headline features that genuinely boost CTR across 30K+ experiments

Separate Signal from Noise

Distinguish genuine insights from statistical anomalies and data artifacts

Validate Reliability

Confirm findings replicate across exploratory and confirmatory experiment sets

Dataset & Tools

AttributeDetail
SourceUpworthy Research Archive
Experiments30,000+ A/B tests
Period2013 – 2015
MetricsImpressions, Clicks, CTR
VariationsMultiple headlines per test, sometimes with image changes
ToolsPython, SciPy, TextBlob, Pandas
Deliverableupworthy_analysis.ipynb (Jupyter Notebook)