Skip to content

State Machine

Entity status is driven by a deterministic state machine. Every transition is recorded in an immutable audit log — transitions are never deleted.

States

State Meaning
unknown No checks have run yet
up Last check passed
degraded Some checks failing; not yet fully down
down Failure threshold exceeded; entity is considered unavailable

Transitions

                 first success
  unknown ────────────────────────► up
                                    │  ▲
              threshold failures    │  │  recovery (any success)
              (partial)             ▼  │
                              degraded
                                    │  ▲
              threshold failures    │  │  recovery (any success)
              (full)                ▼  │
                                   down

Rules

  1. unknown → up: the first successful check result moves the entity to up.
  2. up → degraded: consecutive failures reach the threshold and the worker determines the entity is partially impaired.
  3. up → down / degraded → down: failure threshold exceeded; the entity is fully down.
  4. Any → up (recovery): a single successful check result transitions the entity back to up from any state.

What triggers a transition

The worker (cmd/worker) makes this decision after every check execution:

// simplified worker logic
if result.Success {
    failureCount = 0
    if entity.CurrentStatus != StatusUp {
        transition(entity, StatusUp, "recovery")
    }
} else {
    failureCount++
    if failureCount >= check.FailureThreshold {
        if entity.CurrentStatus == StatusUp {
            transition(entity, StatusDegraded, "threshold reached")
        } else if entity.CurrentStatus == StatusDegraded {
            transition(entity, StatusDown, "threshold reached")
        }
    }
}

The new state and reason are written to state_transitions and published as a StateChangeEvent on NATS. The notify service consumes that event and routes alerts.

Audit log

Every transition is permanent. You can query them per check:

wnp checks transitions <check-id>
TRANSITION         REASON                 WHEN
up → down          3 consecutive fails    2024-01-20 03:17
unknown → up       first success          2024-01-15 09:23

Or see all recent transitions across your fleet:

wnp status transitions

Check results retention

Raw check results (check_results table) are stored in daily partitions. Retention is governed by your plan's retention_days value:

curl -H "Authorization: Bearer $TOKEN" https://api.wanepia.com/v1/me
# "retention_days": 30

When a partition ages past retention_days, it is dropped (fast — no row-level deletes). State transitions are never subject to retention — they are kept indefinitely.