Data Science

Headline A/B Test Analysis

30K+A/B Experiments

2013-15Time Period

6Research Questions

CTRPrimary Metric

By Yusuf Musa

Overview

This project analyzes 30,000+ A/B headline experiments from the Upworthy Research Archive (2013–2015) to determine what drives user engagement and validate experimental reliability. Each experiment tested multiple headline variations — sometimes including image changes — and measured impressions, clicks, and click-through rate (CTR).

The goal: identify headline features that genuinely boost engagement, and distinguish breakthrough insights from statistical anomalies.

Analysis Pipeline

Ingest

30K+ experiments from archive

Engineer

Sentiment, length, question detection

Analyze

Statistical tests & comparisons

Validate

Replication across experiment types

Research Questions

Six hypotheses were tested against the dataset to isolate which headline characteristics drive measurable engagement differences:

Do headlines with numbers outperform those without?

Numeric indicators

Do question-formatted headlines increase CTR?

Question detection

Is there an optimal headline length for engagement?

Character & word count

Do emotional headlines outperform neutral ones?

Sentiment scoring

Does text alone drive performance when images are identical?

Image control groups

Do exploratory findings replicate in confirmatory experiments?

Replication testing

Feature Engineering

Raw headline text was transformed into structured analytical features using NLP techniques. Each headline was scored and categorized across multiple dimensions to enable systematic comparison.

Feature Extraction Pipeline

Raw Input

“10 Things That Will Change How You See the World”

Has NumberTrue

Is QuestionFalse

Word Count10

Char Length49

Sentiment+0.12

PolarityPositive

Statistical Analysis

Each hypothesis was tested using statistical methods from SciPy. The analysis compared CTR distributions between headline groups, controlling for sample size and multiple testing. Results were validated through replication across exploratory and confirmatory experiment subsets.

Validation Framework

Phase A

Exploratory

Initial hypothesis testing on the full dataset to identify patterns

GenerateTestSignal?

Phase B

Confirmatory

Replication testing on held-out experiments to validate findings

ReplicateConfirmRobust?

Only findings that survive both phases are reported as reliable

Headline Feature Comparison

The analysis compared CTR performance across headline characteristics. Each feature was isolated to measure its independent effect on engagement.

CTR by Headline Feature

Contains Number

Yes

Question Format

Question

Declarative

Emotional Tone

Emotional

Neutral

Headline Length

Optimal (8-12 words)

Short / Long

Bar widths represent relative CTR performance (illustrative)

Key Objectives & Outcomes

Maximize Engagement

Identify headline features that genuinely boost CTR across 30K+ experiments

Separate Signal from Noise

Distinguish genuine insights from statistical anomalies and data artifacts

Validate Reliability

Confirm findings replicate across exploratory and confirmatory experiment sets

Dataset & Tools

Attribute	Detail
Source	Upworthy Research Archive
Experiments	30,000+ A/B tests
Period	2013 – 2015
Metrics	Impressions, Clicks, CTR
Variations	Multiple headlines per test, sometimes with image changes
Tools	Python, SciPy, TextBlob, Pandas
Deliverable	upworthy_analysis.ipynb (Jupyter Notebook)