PipelinePulse
● Example · demo data, not live
on-call clock · utc --:--:--
← Data dashboards
PROD
region us-east-1
scheduler healthy
pool 142/256
Synced 2m ago from Airflow · dbt · Snowflake
build a1b9f2c
On-time DAG Success · 24h
99.2%
vs 99.0% target ▲0.4 · 7d avg 99.0%
Runs Completed
1,438/1,442
4 retried · 0 failed
Avg DAG Duration
6m12s
p95 11m40s
SLA Misses ⟶
0
7-day clean streak · MTTR
Queue Wait p95 / Pool
8s
142 / 256 slots (55%)
DAG Run Grid — 24 runs × 18 tasks · newest right · click a cell
432 cells · 426 ✓ · 5 retry · 1 skipped
success success · slow retry skipped running
Top-1% Task Timing
dbt model-timing · longest spans, latest run
Latest Run Gantt — Run #4419 · 14 parallel branches · longest path 6m02s · finished 02:11
parallelism · 'now' line
AI · Critical-Path & ETA Predictor
transform_orders is tonight's pacing task
+18% vs 30-day baseline · projected finish 02:14 · SLA 02:30on track. No backfill needed; auto-retry advisor armed.
Auto-retry advisortransient pattern → retry 2× · backoff 90s before escalate
Backfill window plannercheapest off-peak slot 14:00 · 38-run trough
Anomalous-duration sentinel0 tasks above z=2.5 right now
Schedule-collision detectorno pool contention forecast · 0 staggers
Run History — click a row
JobStEnvSHASchemaDur
Run Intensity Calendar Heatmap — runs per hour, last 24h · peak 02:00–03:00 (212) · trough 14:00 (38)
intensity-shaded
less more 2,047 runs across 24h
Field Guide · how on-call reads this console

01 How to use this

  • Scan the grid first, not the KPIs. Your eye is a parallel anomaly detector — sweep the 24×18 matrix for the one off-color square (the amber dim_customer @ 03:00 cell). Green-everywhere means stand down before standup.
  • Read columns as runs, rows as tasks. A vertical amber streak = one bad run; a horizontal streak = one flaky task across many runs. The pattern tells you whether to re-run a DAG or quarantine a model.
  • The flame strip is your budget. The top-1% longest tasks are where SLA risk lives — if fct_orders creeps past its bar, the whole critical path slips. Watch it before it cascades.
  • Let the AI predictor pre-triage. The critical-path card flags the task most likely to breach tonight and proposes retry-vs-backfill, so you act on the projection instead of waiting for the red.
  • Drill before you escalate. Click any cell for the live log tail, retry boundary, and upstream lineage — confirm root cause in one click rather than tab-hopping to Airflow.
  • For your own org: if you can't find your worst run in under 5 seconds, your run history isn't dense enough — density is the feature, not the bug.

02 Watch the walkthrough

Four AI agents walk this dashboard.

03 In context

● Sample feed
Illustrative — wire to your Airflow / dbt Cloud / Snowflake telemetry feed.
Scheduler heartbeatairflow · us-east-1
4.0s▲ healthy
dbt Cloud queue depthnightly_core project
0 jobs▼ 100%
Snowflake warehouse loadTRANSFORM_XL · auto-scale
62%▲ 3%
Source freshness · orders APIlast successful pull
9m ago▼ on SLA
Pool slot utilizationdefault_pool · 256 slots
55%▲ stable
PagerDuty · open incidentsdata-oncall rotation
0▼ clean 7d
Pipeline Pulse — a staas.fund dashboard showcase · paradigm: dark ops-console / DAG run-grid · demo data, not live · built by Peter Saddington · ← back to Data dashboards