State Machine¶
Entity status is driven by a deterministic state machine. Every transition is recorded in an immutable audit log — transitions are never deleted.
States¶
| State | Meaning |
|---|---|
unknown |
No checks have run yet |
up |
Last check passed |
degraded |
Some checks failing; not yet fully down |
down |
Failure threshold exceeded; entity is considered unavailable |
Transitions¶
first success
unknown ────────────────────────► up
│ ▲
threshold failures │ │ recovery (any success)
(partial) ▼ │
degraded
│ ▲
threshold failures │ │ recovery (any success)
(full) ▼ │
down
Rules¶
unknown → up: the first successful check result moves the entity toup.up → degraded: consecutive failures reach the threshold and the worker determines the entity is partially impaired.up → down/degraded → down: failure threshold exceeded; the entity is fully down.- Any →
up(recovery): a single successful check result transitions the entity back toupfrom any state.
What triggers a transition¶
The worker (cmd/worker) makes this decision after every check execution:
// simplified worker logic
if result.Success {
failureCount = 0
if entity.CurrentStatus != StatusUp {
transition(entity, StatusUp, "recovery")
}
} else {
failureCount++
if failureCount >= check.FailureThreshold {
if entity.CurrentStatus == StatusUp {
transition(entity, StatusDegraded, "threshold reached")
} else if entity.CurrentStatus == StatusDegraded {
transition(entity, StatusDown, "threshold reached")
}
}
}
The new state and reason are written to state_transitions and published as a StateChangeEvent on NATS. The notify service consumes that event and routes alerts.
Audit log¶
Every transition is permanent. You can query them per check:
TRANSITION REASON WHEN
up → down 3 consecutive fails 2024-01-20 03:17
unknown → up first success 2024-01-15 09:23
Or see all recent transitions across your fleet:
Check results retention¶
Raw check results (check_results table) are stored in daily partitions. Retention is governed by your plan's retention_days value:
When a partition ages past retention_days, it is dropped (fast — no row-level deletes). State transitions are never subject to retention — they are kept indefinitely.