HOW WE MEASURE

no vibes. no opinions. just cold, hard latency.

THE PROBE

Every 2 minutes, we send a unique prompt to every AI provider. Each one is different. Each one has a right answer. If the model gets it wrong, we notice. No special treatment, no warm-up calls. Just one honest kick per minute.

THE TIMING

We measure wall clock time — from the moment the request leaves our server to the moment the last byte comes back. No cheating with TTFB. If it takes 4 seconds to think about one sentence, that's 4 seconds of your life gone.

THE LABELS

UP fast & responding (< 3s)

WEARY responding but sluggish (≥ 3s)

DOWN not responding or erroring

STALE data is too old to trust

THE LOGIC

We look at the last 3 probes (~6 minutes of history). If 2 out of 3 failed → DOWN. If 2 out of 3 were slow → WEARY. Otherwise → UP. One bad request doesn't trigger anything — we're not drama queens. Two in a row? Now we're talking.

THE SLOW LANE

Most models get probed every 2 minutes. Some models run on a 6-minute cadence instead — they have tighter rate limits that would choke on the standard interval. Same logic, same thresholds, just slower data. Their staleness window scales to match so they don't get falsely marked STALE between normal probe cycles.

THE REGION

All probes are sent from US West (California) on Railway's us-west2 metal infrastructure. One region, full transparency.

WHAT WE DON'T DO

No retries — a failed probe is valid data
No caching — every request is unique, every answer is verified
No special API tiers — same access as you
No manual overrides — the numbers are the numbers

questions? complaints? existential dread about AI reliability?

same tbh