Liftboard
Experiment Scorecard
EXAMPLE · demo data, not live
EXP CLOCK · --:--:--
← Product suite
New Checkout Flow · EXP-2291 · primary: Purchase Conversion
Day 9 of 14 Confidence 95% CUPED ON SEQUENTIAL ON
Metric Scorecard — click any row to drill in
CUPED-adjusted · peeking-safe
Metric Control Treatment Δ Lift 95% CI p-value Significance
CI Whisker Plots — clears zero = win
interval vs 0
−8%−4%0+4%+8%
AI Ship Recommendation
Treatment wins — ship to 100%
Primary metric significant at p=0.004 with no guardrail regressions. Sequential testing confirms the result is peeking-safe. CUPED tightened the interval ~38%.
+$1.2M
Est. annual impact
+4.7%
Conversion at 100%
0
Guardrail regressions
Guardrail Metrics
all clear
p99 Latency+0.3%n.s.
Refund Rate−0.1%n.s.
Support Tickets−1.2%n.s.
Page Errors+0.0%n.s.
Checkout Abandon−3.4%n.s.
Stat Engine
CUPED · power
Variance reduced
38%
~5 days faster
Primary lift
+4.7%
p = 0.004
Statistical power
0.91
target 0.80 ✓
Total exposures
84,210
50.1 / 49.9
AI Heterogeneity Finder
segment effects
Effect concentration detected
Lift is concentrated in mobile + new users (+9.1%); flat on desktop returners (+0.4%, n.s.). Consider a targeted rollout to the high-lift segment for a cleaner incremental read. Click to inspect →
Cumulative Exposures — balanced allocation
14-day window
Control 42,180 Treatment 42,030
Significance Trajectory — p-value over time
crossed α Day 7
p-value α = 0.05 threshold
Variant Allocation
SRM check passed
Control 50%
Treatment 50%
n = 42,180 SRM χ² = 0.27 · p = 0.60 ✓ n = 42,030
Avg daily exposures9,357
Min detectable effect±1.8%
Days to significance7 of 14
Field Guide

How to use this

  • Read the verdict, not the dashboard. The decision this view drives is binary: ship or don't. Start at the AI Ship card, then sanity-check it against the scorecard and guardrails before you trust it.
  • Watch the whisker plots first. A bar that fully clears the dotted zero line (glowing teal) is a real win; a bar that crosses zero (grey) is not yet significant no matter how positive the point estimate looks.
  • Always scan guardrails before celebrating. A primary-metric win with a latency or refund-rate regression is not shippable. All five here are n.s. — that's what "no regressions" means.
  • CUPED and Sequential are your two safety rails. CUPED tightens the interval so you reach significance ~5 days sooner; Sequential makes daily peeking statistically valid instead of inflating false positives.
  • Mind heterogeneity before a global rollout. When lift concentrates in one segment (mobile + new users here), a 100% ship can dilute or hide the real effect — the AI finder flags this automatically.
  • For your own org: if your team ships on raw p-values without sequential correction, you're likely shipping a meaningful share of false winners. What's your peeking discipline today?
A test isn't "done" because the lift is positive — it's done when the interval clears zero AND every guardrail holds AND the result survives sequential correction.

In context Sample Feed

Illustrative — wire to your experimentation platform & warehouse feed
Experiments runningacross squads, last 30d
37
▲ 6
Win rateshipped / concluded
31%
▲ 2.4pt
Median time-to-significanceCUPED-adjusted
8.1d
▼ 1.3d
False-positive ratesequential vs naive backtest
4.9%
≈ at α
Avg variance reductionCUPED across active tests
34%
▲ 3pt
Realized impact YTDshipped winners, modeled
+$8.4M
▲ $1.2M

Watch the walkthrough

Agent review
Four AI agents walk this dashboard.