Liftboard — Experimentation Lead · DS

Metric Scorecard — click any row to drill in

CUPED-adjusted · peeking-safe

Metric	Control	Treatment	Δ Lift	95% CI	p-value	Significance

CI Whisker Plots — clears zero = win

interval vs 0

−8%−4%0+4%+8%

AI Ship Recommendation

Treatment wins — ship to 100%

Primary metric significant at p=0.004 with no guardrail regressions. Sequential testing confirms the result is peeking-safe. CUPED tightened the interval ~38%.

+$1.2M

Est. annual impact

+4.7%

Conversion at 100%

Guardrail regressions

Guardrail Metrics

all clear

p99 Latency+0.3%n.s.

Refund Rate−0.1%n.s.

Support Tickets−1.2%n.s.

Page Errors+0.0%n.s.

Checkout Abandon−3.4%n.s.

Stat Engine

CUPED · power

Variance reduced

38%

~5 days faster

Primary lift

+4.7%

p = 0.004

Statistical power

0.91

target 0.80 ✓

Total exposures

84,210

50.1 / 49.9

AI Heterogeneity Finder

segment effects

◆Effect concentration detected

Lift is concentrated in mobile + new users (+9.1%); flat on desktop returners (+0.4%, n.s.). Consider a targeted rollout to the high-lift segment for a cleaner incremental read. Click to inspect →

Cumulative Exposures — balanced allocation

14-day window

Control 42,180 Treatment 42,030

Significance Trajectory — p-value over time

crossed α Day 7

p-value α = 0.05 threshold

Variant Allocation

SRM check passed

Control 50%

Treatment 50%

n = 42,180 SRM χ² = 0.27 · p = 0.60 ✓ n = 42,030

Avg daily exposures9,357

Min detectable effect±1.8%

Days to significance7 of 14

Field Guide

How to use this

Read the verdict, not the dashboard. The decision this view drives is binary: ship or don't. Start at the AI Ship card, then sanity-check it against the scorecard and guardrails before you trust it.
Watch the whisker plots first. A bar that fully clears the dotted zero line (glowing teal) is a real win; a bar that crosses zero (grey) is not yet significant no matter how positive the point estimate looks.
Always scan guardrails before celebrating. A primary-metric win with a latency or refund-rate regression is not shippable. All five here are n.s. — that's what "no regressions" means.
CUPED and Sequential are your two safety rails. CUPED tightens the interval so you reach significance ~5 days sooner; Sequential makes daily peeking statistically valid instead of inflating false positives.
Mind heterogeneity before a global rollout. When lift concentrates in one segment (mobile + new users here), a 100% ship can dilute or hide the real effect — the AI finder flags this automatically.
For your own org: if your team ships on raw p-values without sequential correction, you're likely shipping a meaningful share of false winners. What's your peeking discipline today?

A test isn't "done" because the lift is positive — it's done when the interval clears zero AND every guardrail holds AND the result survives sequential correction.

In context Sample Feed

Illustrative — wire to your experimentation platform & warehouse feed

Experiments runningacross squads, last 30d

▲ 6

Win rateshipped / concluded

31%

▲ 2.4pt

Median time-to-significanceCUPED-adjusted

8.1d

▼ 1.3d

False-positive ratesequential vs naive backtest

4.9%

≈ at α

Avg variance reductionCUPED across active tests

34%

▲ 3pt

Realized impact YTDshipped winners, modeled

+$8.4M

▲ $1.2M

Watch the walkthrough

Agent review

Four AI agents walk this dashboard.