Pipeline Pulse — Data Eng

On-time DAG Success · 24h

99.2%

vs 99.0% target ▲0.4 · 7d avg 99.0%

Runs Completed

1,438/1,442

4 retried · 0 failed

Avg DAG Duration

6m12s

p95 11m40s

SLA Misses ⟶

7-day clean streak · MTTR ▼

Queue Wait p95 / Pool

142 / 256 slots (55%)

DAG Run Grid — 24 runs × 18 tasks · newest right · click a cell

432 cells · 426 ✓ · 5 retry · 1 skipped

success success · slow retry skipped running

Top-1% Task Timing

dbt model-timing · longest spans, latest run

Latest Run Gantt — Run #4419 · 14 parallel branches · longest path 6m02s · finished 02:11

parallelism · 'now' line

AI · Critical-Path & ETA Predictor

transform_orders is tonight's pacing task

+18% vs 30-day baseline · projected finish 02:14 · SLA 02:30 — on track. No backfill needed; auto-retry advisor armed.

Auto-retry advisortransient pattern → retry 2× · backoff 90s before escalate

Backfill window plannercheapest off-peak slot 14:00 · 38-run trough

Anomalous-duration sentinel0 tasks above z=2.5 right now

Schedule-collision detectorno pool contention forecast · 0 staggers

Run History — click a row

Job	St	Env	SHA	Schema	Dur

Run Intensity Calendar Heatmap — runs per hour, last 24h · peak 02:00–03:00 (212) · trough 14:00 (38)

intensity-shaded

less more 2,047 runs across 24h

Field Guide · how on-call reads this console

01 How to use this

▸Scan the grid first, not the KPIs. Your eye is a parallel anomaly detector — sweep the 24×18 matrix for the one off-color square (the amber dim_customer @ 03:00 cell). Green-everywhere means stand down before standup.
▸Read columns as runs, rows as tasks. A vertical amber streak = one bad run; a horizontal streak = one flaky task across many runs. The pattern tells you whether to re-run a DAG or quarantine a model.
▸The flame strip is your budget. The top-1% longest tasks are where SLA risk lives — if fct_orders creeps past its bar, the whole critical path slips. Watch it before it cascades.
▸Let the AI predictor pre-triage. The critical-path card flags the task most likely to breach tonight and proposes retry-vs-backfill, so you act on the projection instead of waiting for the red.
▸Drill before you escalate. Click any cell for the live log tail, retry boundary, and upstream lineage — confirm root cause in one click rather than tab-hopping to Airflow.
▸For your own org: if you can't find your worst run in under 5 seconds, your run history isn't dense enough — density is the feature, not the bug.

02 Watch the walkthrough

Four AI agents walk this dashboard.

03 In context

● Sample feed

Illustrative — wire to your Airflow / dbt Cloud / Snowflake telemetry feed.

Scheduler heartbeatairflow · us-east-1

4.0s▲ healthy

dbt Cloud queue depthnightly_core project

0 jobs▼ 100%

Snowflake warehouse loadTRANSFORM_XL · auto-scale

62%▲ 3%

Source freshness · orders APIlast successful pull

9m ago▼ on SLA

Pool slot utilizationdefault_pool · 256 slots

55%▲ stable

PagerDuty · open incidentsdata-oncall rotation

0▼ clean 7d