Headline A/B Test Analysis

Overview
This project analyzes 30,000+ A/B headline experiments from the Upworthy Research Archive (2013–2015) to determine what drives user engagement and validate experimental reliability. Each experiment tested multiple headline variations — sometimes including image changes — and measured impressions, clicks, and click-through rate (CTR).
The goal: identify headline features that genuinely boost engagement, and distinguish breakthrough insights from statistical anomalies.
Analysis Pipeline
Ingest
30K+ experiments from archive
Engineer
Sentiment, length, question detection
Analyze
Statistical tests & comparisons
Validate
Replication across experiment types
Research Questions
Six hypotheses were tested against the dataset to isolate which headline characteristics drive measurable engagement differences:
Do headlines with numbers outperform those without?
Numeric indicators
Do question-formatted headlines increase CTR?
Question detection
Is there an optimal headline length for engagement?
Character & word count
Do emotional headlines outperform neutral ones?
Sentiment scoring
Does text alone drive performance when images are identical?
Image control groups
Do exploratory findings replicate in confirmatory experiments?
Replication testing
Feature Engineering
Raw headline text was transformed into structured analytical features using NLP techniques. Each headline was scored and categorized across multiple dimensions to enable systematic comparison.
Feature Extraction Pipeline
Raw Input
“10 Things That Will Change How You See the World”
Statistical Analysis
Each hypothesis was tested using statistical methods from SciPy. The analysis compared CTR distributions between headline groups, controlling for sample size and multiple testing. Results were validated through replication across exploratory and confirmatory experiment subsets.
Validation Framework
Phase A
Exploratory
Initial hypothesis testing on the full dataset to identify patterns
Phase B
Confirmatory
Replication testing on held-out experiments to validate findings
Only findings that survive both phases are reported as reliable
Headline Feature Comparison
The analysis compared CTR performance across headline characteristics. Each feature was isolated to measure its independent effect on engagement.
CTR by Headline Feature
Contains Number
Question Format
Emotional Tone
Headline Length
Bar widths represent relative CTR performance (illustrative)
Key Objectives & Outcomes
Maximize Engagement
Identify headline features that genuinely boost CTR across 30K+ experiments
Separate Signal from Noise
Distinguish genuine insights from statistical anomalies and data artifacts
Validate Reliability
Confirm findings replicate across exploratory and confirmatory experiment sets
Dataset & Tools
| Attribute | Detail |
|---|---|
| Source | Upworthy Research Archive |
| Experiments | 30,000+ A/B tests |
| Period | 2013 – 2015 |
| Metrics | Impressions, Clicks, CTR |
| Variations | Multiple headlines per test, sometimes with image changes |
| Tools | Python, SciPy, TextBlob, Pandas |
| Deliverable | upworthy_analysis.ipynb (Jupyter Notebook) |
