The Stats Sync Pattern: Aggregating Metrics Across Distributed Services

You don't always need a full observability stack. Sometimes you just need to know how many things exist across your infrastructure — and a bash script, a cron job, and a VPN can get you there in an afternoon.

I run a platform with components spread across multiple machines: an agent gateway on one box, MCP tool servers, a package registry, app scaffolding templates, and the client-facing apps on the VPS. Each component has its own metrics — how many agent archetypes exist, how many MCP tools are available, how many packages are published.

I wanted a single dashboard that shows all of these numbers. The obvious answer is a monitoring stack: Prometheus, Grafana, exporters, alert rules. The actual answer is a 60-line bash script that runs hourly.

The Architecture

┌─────────────────────┐          ┌─────────────────────┐
│  Dev Machine (HX90)  │          │   VPS (Production)    │
│                      │          │                       │
│  Gateway configs     │  POST    │  Emergent App         │
│  MCP tool registry   │ ──────→ │  /api/admin/stats/sync│
│  Package manifests   │ Tailscale│  ↓                    │
│  Scaffolder templates│  (VPN)   │  PostgreSQL (Stat)    │
│  Agent skill files   │          │  ↓                    │
│                      │          │  Admin Dashboard      │
└─────────────────────┘          └─────────────────────┘

A cron job on the dev machine runs every hour. It counts things — files, JSON entries, config keys — and POSTs the results to the Emergent app's stats sync endpoint over Tailscale.

The Bash Script

The script is dead simple: count things on the local machine and build a JSON payload. No SDKs, no client libraries — just shell commands you already know.

# Count agent archetypes
ARCHETYPES=$(cat config/archetypes.json | python3 -c "
import sys, json; print(len(json.load(sys.stdin)))
")

# Count MCP tools
MCP_TOOLS=$(find mcp-servers/ -name "*.py" -path "*/tools/*" | wc -l)

# Count packages in registry
PACKAGES=$(ls -d packages/*/package.json 2>/dev/null | wc -l)

# Count scaffolder generators
GENERATORS=$(ls -d generators/*/ 2>/dev/null | wc -l)

Each metric gets a key, value, label (human-readable name), group (for dashboard sections), and sort order. The script assembles them into a JSON array and POSTs it:

curl -s -X POST \
  "http://<tailscale-ip>:<app-port>/api/admin/stats/sync" \
  -H "Authorization: Bearer ${STATS_SYNC_SECRET}" \
  -H "Content-Type: application/json" \
  -d "${PAYLOAD}"

Note the IP: the URL uses the VPS's Tailscale address, not a public IP. This traffic never touches the public internet. The Tailscale mesh VPN means the sync endpoint doesn't need to be publicly accessible, which eliminates an entire category of security concerns.

The Sync Endpoint

On the receiving end, there's a Next.js API route that does two things: verify the request is legit (bearer token), then upsert each stat into the database. "Upsert" means "create if new, update if it already exists" — which makes the whole thing idempotent.

export async function POST(request: Request) {
  const authHeader = request.headers.get('authorization')
  if (authHeader !== `Bearer ${process.env.STATS_SYNC_SECRET}`) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 })
  }

  const stats = await request.json()

  for (const stat of stats) {
    await prisma.stat.upsert({
      where: { key: stat.key },
      update: { value: stat.value, label: stat.label },
      create: { key: stat.key, value: stat.value, label: stat.label,
                group: stat.group, sortOrder: stat.sortOrder }
    })
  }

  return Response.json({ synced: stats.length })
}

Upsert means the script is idempotent. Run it once, run it a hundred times — the result is the same. If a metric disappears from the script, the old value stays in the database (stale but harmless). If a new metric is added, it's created on the first sync.

The Dashboard

The admin stats page groups metrics by category and displays them in a simple grid. Nothing fancy — just numbers with labels. But having "12 agent archetypes, 47 MCP tools, 8 packages, 15 generators" visible in one place tells me the platform's shape at a glance.

The stats are editable from the admin UI too. If I want to manually update a number (say, I know a new package was published but the cron hasn't run yet), I can override it directly. The next sync will overwrite it with the real count.

Why Not Prometheus?

Prometheus is designed for time-series metrics: request latency, error rates, CPU usage. My metrics are inventory counts — how many of a thing exist right now. They change when I add a new agent archetype or publish a package, not on a per-request basis.

Setting up Prometheus + Grafana for 15 slowly-changing integers would be like building a warehouse to store a shoebox. The monitoring stack has more moving parts than the thing it's monitoring.

Why Tailscale?

The dev machine (where the source configs live) and the VPS (where the dashboard runs) are on different networks. Options:

Expose the sync endpoint publicly — Works, but now I need rate limiting, IP allowlisting, and HTTPS on a route that handles auth via a bearer token. Every public endpoint is an attack surface.
VPN between machines — Tailscale gives me a private mesh network with zero configuration. The VPS is reachable at a stable private IP that only my machines can access. The sync endpoint is effectively internal-only.
Push to a shared database — Both machines could write to the same PostgreSQL instance. But the VPS database isn't accessible from the dev machine (by design), and opening it up defeats the purpose.

Tailscale was the 5-minute solution that eliminated the security concerns entirely.

The Pattern, Generalized

If you have metrics or state scattered across multiple machines and you need a dashboard:

Write a script that counts things. Keep it dumb. File counts, JSON array lengths, database row counts. wc -l and jq length solve most cases.
POST to a sync endpoint. Bearer token auth, upsert semantics, idempotent processing.
Run it on a cron. Hourly is fine for inventory metrics. If you need real-time, you need a different pattern (webhooks, event streams). But most "dashboards" are checked once a day — hourly resolution is overkill, not underkill.
Use a private network. Tailscale, WireGuard, or even SSH tunnels. Don't expose internal endpoints to the internet just because it's easier.

The total cost of this system: one cron entry, one bash script, one API route, one database table. No monitoring stack, no time-series database, no alerting rules. Sometimes the right infrastructure is barely any infrastructure at all.