Health Checks¶

A check is a probe attached to an entity. The generator schedules checks on their configured intervals; workers execute them, record results, and trigger state transitions when failures accumulate past the threshold.

Check types¶

HTTP¶

Performs an HTTP/HTTPS GET (or configurable method) to a URL.

Parameter	Default	Description
`target_url`	—	Full URL including scheme
`expected_status`	`200`	HTTP status code that counts as success
`body_contains`	`""`	Substring that must appear in the response body
`interval_seconds`	`60`	How often to run
`timeout_ms`	`5000`	Request timeout in milliseconds
`failure_threshold`	`3`	Consecutive failures before state change

TCP¶

Opens a TCP connection to a host:port pair.

{
  "check_type": "tcp",
  "config": {
    "host": "db.internal",
    "port": 5432
  }
}

TLS¶

Like TCP, but also validates the TLS certificate (expiry and hostname).

{
  "check_type": "tls",
  "config": {
    "host": "api.example.com",
    "port": 443
  }
}

DNS¶

Resolves a hostname and optionally validates the record type and expected value.

{
  "check_type": "dns",
  "config": {
    "hostname": "example.com",
    "record_type": "A",
    "expected_value": "93.184.216.34"
  }
}

Push¶

A push check inverts the relationship: instead of Wanepia reaching out to your service, your agent posts results to Wanepia. Use push checks for services that are unreachable from the public internet — databases behind a firewall, internal Kubernetes workloads, services on a private VPN, or anything that cannot accept inbound connections.

Push checks have no target_url; the worker never executes them. Your agent owns the probe logic and calls Wanepia when it has a result.

Push checks¶

Creating a push check¶

curlwnp

curl -X POST https://api.wanepia.com/v1/checks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_id": "<entity-id>",
    "check_type": "push",
    "execution_mode": "push",
    "name": "Postgres reachability",
    "interval_seconds": 60,
    "failure_threshold": 1
  }'

wnp checks create \
  --entity <entity-id> \
  --type push \
  --name "Postgres reachability" \
  --interval 60 \
  --threshold 1

The response includes the check's id. Save it — you will use it in the push endpoint.

Posting a result¶

Call this endpoint from your agent each time it runs a probe:

POST /v1/checks/{id}/results

Request body:

Field	Required	Description
`success`	Yes	`true` if the probe passed, `false` if it failed
`latency_ms`	Yes	How long the probe took, in milliseconds
`error_message`	No	Human-readable failure detail (shown in the UI)
`checked_at`	No	ISO 8601 timestamp — defaults to now if omitted
`status_code`	No	Numeric code (e.g. HTTP status) if meaningful

curl -X POST https://api.wanepia.com/v1/checks/$CHECK_ID/results \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "success": true,
    "latency_ms": 45,
    "error_message": ""
  }'

Timestamp constraints

checked_at cannot be more than 1 minute in the future, and cannot be older than your account's retention window. Posting to a pull-mode check returns 409 Conflict.

Staleness detection¶

If your agent stops posting, Wanepia marks the check as stale when:

now − last_result_time  >  interval_seconds × 2

The dashboard shows a warning banner on stale push checks so you can tell the difference between "the probe ran and passed" and "the probe stopped running entirely".

Set a short interval

Even if your agent runs every 5 minutes, set interval_seconds: 300 so the staleness threshold (10 minutes) matches the expected cadence.

Failure threshold guidance¶

Push agents typically run infrequently. A failure_threshold of 3 would mean three consecutive failed runs — potentially 15 minutes of real downtime — before an alert fires. For push checks, set failure_threshold: 1 so a single failure triggers an immediate alert.

Agent implementation patterns¶

Shell script — Postgres reachability¶

Drop this script on any host inside your network that can reach the database. Run it from cron or a Kubernetes CronJob.

#!/usr/bin/env bash
set -euo pipefail

WANEPIA_TOKEN="$WANEPIA_TOKEN"
CHECK_ID="$CHECK_ID"
DB_HOST="${DB_HOST:-localhost}"
DB_PORT="${DB_PORT:-5432}"

start=$(date +%s%3N)

if pg_isready -h "$DB_HOST" -p "$DB_PORT" -q; then
  end=$(date +%s%3N)
  latency=$(( end - start ))
  payload='{"success":true,"latency_ms":'"$latency"',"error_message":""}'
else
  end=$(date +%s%3N)
  latency=$(( end - start ))
  payload='{"success":false,"latency_ms":'"$latency"',"error_message":"pg_isready: host unreachable or refusing connections"}'
fi

curl -s -X POST "https://api.wanepia.com/v1/checks/$CHECK_ID/results" \
  -H "Authorization: Bearer $WANEPIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d "$payload"

Run it as a cron job every minute:

* * * * * /usr/local/bin/wanepia-pg-check.sh

Kubernetes CronJob

Wrap the script in a lightweight container image and deploy a CronJob with schedule: "* * * * *" to probe services inside your cluster without exposing them to the internet.

Creating a check¶

CLIAPI

# HTTP check
wnp checks create \
  --entity <entity-id> \
  --type http \
  --url https://api.example.com/health \
  --interval 30 \
  --status 200 \
  --body '"status":"ok"' \
  --threshold 2

# TCP check (via API — CLI uses --url for the target)
wnp checks create \
  --entity <entity-id> \
  --type tcp \
  --url db.internal:5432 \
  --interval 60

curl -X POST https://api.wanepia.com/v1/checks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_id": "...",
    "check_type": "http",
    "target_url": "https://api.example.com/health",
    "interval_seconds": 30,
    "timeout_ms": 3000,
    "expected_status": 200,
    "body_contains": "ok",
    "failure_threshold": 2,
    "enabled": true
  }'

Managing checks¶

# List all checks
wnp checks list

# Inspect a single check (prefix is enough)
wnp checks get a1b2

# Disable without deleting
wnp checks disable a1b2

# Re-enable
wnp checks enable a1b2

# Update interval
wnp checks update a1b2 --interval 120

# Delete
wnp checks delete a1b2

Viewing results¶

wnp checks results a1b2 --limit 20

    STATUS   LATENCY   CHECKED AT        ERROR
✓   200      42ms      2024-01-20 09:15
✓   200      38ms      2024-01-20 09:14
✗   0        —         2024-01-20 09:13  connection refused

Failure threshold and state transitions¶

The failure_threshold prevents flapping. A check must fail that many times consecutively before the entity's status changes. A single success resets the failure counter.

attempt 1: fail  (counter: 1/3)
attempt 2: fail  (counter: 2/3)
attempt 3: fail  (counter: 3/3) → entity transitions to down
attempt 4: ok    → entity transitions to up, counter resets

See State Machine for degraded vs. down logic.

Check limits¶

The number of checks per entity is governed by your plan limit:

curl -H "Authorization: Bearer $TOKEN" https://api.wanepia.com/v1/me
# "check_limit_per_entity": 5