# Notion Webhook Recurrence Prevention Runbook

## Scope
This runbook prevents repeat Notion webhook outages caused by token drift, repeated forwarding auth failures, or silent webhook suspension.

Implemented controls:
1. **Webhook SLO watchdog** (every 10 minutes)
2. **Atomic rotate-and-verify token workflow**
3. **Suspension sentinel** (endpoint reachable + no successful deliveries for extended window)

## Components

### 1) Watchdog
- Script: `/Users/openclaw/.openclaw/workspace/scripts/notion_webhook_watchdog.py`
- LaunchAgent: `ai.openclaw.notion-webhook-watchdog`
- Schedule: every 600 seconds (10 min)
- State file: `/Users/openclaw/.openclaw/workspace/tmp/notion-webhook-watchdog/state.json`
- Health artifact: `/Users/openclaw/.openclaw/workspace/tmp/notion-webhook-watchdog/health.json`
- Alert artifact: `/Users/openclaw/.openclaw/workspace/tmp/notion-webhook-watchdog/alert.md`

Checks performed:
- Repeated `401` and `5xx` forwarding failures in new log window.
- No successful forwards in threshold window.
- Suspension sentinel if endpoint is reachable but success gap exceeds threshold.

Webhook handling note:
- The Notion event forward payload now explicitly requires webhook-driven work to: update the relevant Notion thread with findings/action taken, send Braden a concise Telegram summary, and return a Completion Contract.

Default thresholds (overridable by env vars):
- `REPEATED_FAIL_THRESHOLD=3`
- `NO_SUCCESS_MINUTES=90`
- `SUSPENSION_HOURS=6`

### 2) Token rotate + verify
- Script: `/Users/openclaw/.openclaw/workspace/scripts/notion_webhook_rotate_and_verify.sh`
- Rotates (single token value):
  - `hooks.token`
  - `gateway.auth.token`
  - `gateway.remote.token`
- Restarts:
  - `openclaw gateway restart`
  - `launchctl kickstart -k gui/$(id -u)/ai.openclaw.event-router`
- Verifies:
  - new token → hook endpoint returns `200`
  - old token → hook endpoint returns `401`
  - router forward self-test endpoint returns `200`

Backup + rollback artifacts:
- Config backup: `tmp/notion-webhook-token-rotation/openclaw.json.<timestamp>.bak`
- Rollback note: `tmp/notion-webhook-token-rotation/ROLLBACK-<timestamp>.md`
- Rotation report: `tmp/notion-webhook-token-rotation/last-rotation-report.md`

### 3) Event router token source hardening
- Router now reads hook token from `/Users/openclaw/.openclaw/openclaw.json` (`hooks.token`) at forward time.
- No hardcoded hook token in router source.
- Added local-only self-test route: `POST /selftest/forward`.

## Commands

### Watchdog
```bash
# One-shot run
/usr/bin/python3 /Users/openclaw/.openclaw/workspace/scripts/notion_webhook_watchdog.py

# Inspect health/alert artifacts
cat /Users/openclaw/.openclaw/workspace/tmp/notion-webhook-watchdog/health.json
cat /Users/openclaw/.openclaw/workspace/tmp/notion-webhook-watchdog/alert.md

# LaunchAgent status
launchctl print gui/$(id -u)/ai.openclaw.notion-webhook-watchdog | head -40
```

### Rotate and verify
```bash
# Generate random token automatically
/Users/openclaw/.openclaw/workspace/scripts/notion_webhook_rotate_and_verify.sh

# Provide explicit token (if required by policy)
/Users/openclaw/.openclaw/workspace/scripts/notion_webhook_rotate_and_verify.sh '<new-token-value>'
```

### Manual forward-path check (no token printed)
```bash
curl -s -o /dev/null -w "%{http_code}\n" \
  -X POST http://127.0.0.1:8080/selftest/forward \
  -H 'Content-Type: application/json' \
  --data '{"reason":"manual-check"}'
```

## Alert interpretation
- **WARNING**:
  - repeated forwarding auth/upstream errors, or
  - no successful forwards in threshold window.
- **CRITICAL**:
  - endpoint reachable, but no successful deliveries for suspension threshold.
  - Action: In Notion, check webhook status; resume/recreate if suspended.

## Rollback
1. Restore backup config from the rotation directory.
2. Restart gateway.
3. Restart event-router.
4. Confirm:
   - hook endpoint accepts current token
   - old token rejected
   - router self-test forward returns `200`
