Skip to content

Health Checks

A check is a probe attached to an entity. The generator schedules checks on their configured intervals; workers execute them, record results, and trigger state transitions when failures accumulate past the threshold.

Check types

HTTP

Performs an HTTP/HTTPS GET (or configurable method) to a URL.

Parameter Default Description
target_url Full URL including scheme
expected_status 200 HTTP status code that counts as success
body_contains "" Substring that must appear in the response body
interval_seconds 60 How often to run
timeout_ms 5000 Request timeout in milliseconds
failure_threshold 3 Consecutive failures before state change

TCP

Opens a TCP connection to a host:port pair.

{
  "check_type": "tcp",
  "config": {
    "host": "db.internal",
    "port": 5432
  }
}

TLS

Like TCP, but also validates the TLS certificate (expiry and hostname).

{
  "check_type": "tls",
  "config": {
    "host": "api.example.com",
    "port": 443
  }
}

DNS

Resolves a hostname and optionally validates the record type and expected value.

{
  "check_type": "dns",
  "config": {
    "hostname": "example.com",
    "record_type": "A",
    "expected_value": "93.184.216.34"
  }
}

Push

A push check inverts the relationship: instead of Wanepia reaching out to your service, your agent posts results to Wanepia. Use push checks for services that are unreachable from the public internet — databases behind a firewall, internal Kubernetes workloads, services on a private VPN, or anything that cannot accept inbound connections.

Push checks have no target_url; the worker never executes them. Your agent owns the probe logic and calls Wanepia when it has a result.

Push checks

Creating a push check

curl -X POST https://api.wanepia.com/v1/checks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_id": "<entity-id>",
    "check_type": "push",
    "execution_mode": "push",
    "name": "Postgres reachability",
    "interval_seconds": 60,
    "failure_threshold": 1
  }'
wnp checks create \
  --entity <entity-id> \
  --type push \
  --name "Postgres reachability" \
  --interval 60 \
  --threshold 1

The response includes the check's id. Save it — you will use it in the push endpoint.

Posting a result

Call this endpoint from your agent each time it runs a probe:

POST /v1/checks/{id}/results

Request body:

Field Required Description
success Yes true if the probe passed, false if it failed
latency_ms Yes How long the probe took, in milliseconds
error_message No Human-readable failure detail (shown in the UI)
checked_at No ISO 8601 timestamp — defaults to now if omitted
status_code No Numeric code (e.g. HTTP status) if meaningful
curl -X POST https://api.wanepia.com/v1/checks/$CHECK_ID/results \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "success": true,
    "latency_ms": 45,
    "error_message": ""
  }'

Timestamp constraints

checked_at cannot be more than 1 minute in the future, and cannot be older than your account's retention window. Posting to a pull-mode check returns 409 Conflict.

Staleness detection

If your agent stops posting, Wanepia marks the check as stale when:

now − last_result_time  >  interval_seconds × 2

The dashboard shows a warning banner on stale push checks so you can tell the difference between "the probe ran and passed" and "the probe stopped running entirely".

Set a short interval

Even if your agent runs every 5 minutes, set interval_seconds: 300 so the staleness threshold (10 minutes) matches the expected cadence.

Failure threshold guidance

Push agents typically run infrequently. A failure_threshold of 3 would mean three consecutive failed runs — potentially 15 minutes of real downtime — before an alert fires. For push checks, set failure_threshold: 1 so a single failure triggers an immediate alert.

Agent implementation patterns

Shell script — Postgres reachability

Drop this script on any host inside your network that can reach the database. Run it from cron or a Kubernetes CronJob.

#!/usr/bin/env bash
set -euo pipefail

WANEPIA_TOKEN="$WANEPIA_TOKEN"
CHECK_ID="$CHECK_ID"
DB_HOST="${DB_HOST:-localhost}"
DB_PORT="${DB_PORT:-5432}"

start=$(date +%s%3N)

if pg_isready -h "$DB_HOST" -p "$DB_PORT" -q; then
  end=$(date +%s%3N)
  latency=$(( end - start ))
  payload='{"success":true,"latency_ms":'"$latency"',"error_message":""}'
else
  end=$(date +%s%3N)
  latency=$(( end - start ))
  payload='{"success":false,"latency_ms":'"$latency"',"error_message":"pg_isready: host unreachable or refusing connections"}'
fi

curl -s -X POST "https://api.wanepia.com/v1/checks/$CHECK_ID/results" \
  -H "Authorization: Bearer $WANEPIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d "$payload"

Run it as a cron job every minute:

* * * * * /usr/local/bin/wanepia-pg-check.sh

Kubernetes CronJob

Wrap the script in a lightweight container image and deploy a CronJob with schedule: "* * * * *" to probe services inside your cluster without exposing them to the internet.

Creating a check

# HTTP check
wnp checks create \
  --entity <entity-id> \
  --type http \
  --url https://api.example.com/health \
  --interval 30 \
  --status 200 \
  --body '"status":"ok"' \
  --threshold 2

# TCP check (via API — CLI uses --url for the target)
wnp checks create \
  --entity <entity-id> \
  --type tcp \
  --url db.internal:5432 \
  --interval 60
curl -X POST https://api.wanepia.com/v1/checks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_id": "...",
    "check_type": "http",
    "target_url": "https://api.example.com/health",
    "interval_seconds": 30,
    "timeout_ms": 3000,
    "expected_status": 200,
    "body_contains": "ok",
    "failure_threshold": 2,
    "enabled": true
  }'

Managing checks

# List all checks
wnp checks list

# Inspect a single check (prefix is enough)
wnp checks get a1b2

# Disable without deleting
wnp checks disable a1b2

# Re-enable
wnp checks enable a1b2

# Update interval
wnp checks update a1b2 --interval 120

# Delete
wnp checks delete a1b2

Viewing results

wnp checks results a1b2 --limit 20
    STATUS   LATENCY   CHECKED AT        ERROR
✓   200      42ms      2024-01-20 09:15
✓   200      38ms      2024-01-20 09:14
✗   0        —         2024-01-20 09:13  connection refused

Failure threshold and state transitions

The failure_threshold prevents flapping. A check must fail that many times consecutively before the entity's status changes. A single success resets the failure counter.

attempt 1: fail  (counter: 1/3)
attempt 2: fail  (counter: 2/3)
attempt 3: fail  (counter: 3/3) → entity transitions to down
attempt 4: ok    → entity transitions to up, counter resets

See State Machine for degraded vs. down logic.

Check limits

The number of checks per entity is governed by your plan limit:

curl -H "Authorization: Bearer $TOKEN" https://api.wanepia.com/v1/me
# "check_limit_per_entity": 5