MCP Tools as a DevOps Interface: Managing 8 Production Apps from Claude
What if you could debug a production outage by saying "the site is returning a 502" and having an AI agent check health status, read logs, identify the root cause, and fix it — all in 30 seconds? That's what I built with MCP tools, and it's changed how I think about DevOps.
My old debugging workflow: SSH into the VPS, cd to the project directory, tail -f the logs, grep for the error, check the systemd status, maybe restart the service, check the Nginx config, query the database with psql. Each step is simple. Chaining them together for eight different projects across multiple terminal windows is not.
Now I describe the problem in natural language and an AI agent runs the right commands.
What MCP Tools Are (The Short Version)
Model Context Protocol (MCP) tools are functions that an AI model can call during a conversation. Instead of the model generating text that tells you what to run, it actually runs the commands and processes the results.
I built 17 MCP tools that cover the full lifecycle of managing production apps:
| Tool | Purpose |
|---|---|
hostkit_deploy_local | Deploy pre-built files from local machine |
hostkit_status | Health check, services, URL, recent logs |
hostkit_wait_healthy | Poll health endpoint after deploy |
hostkit_execute | Run arbitrary CLI commands (rollback, restart, etc.) |
hostkit_fix | Auto-diagnose and suggest fixes for common errors |
hostkit_solutions | Cross-project knowledge base of solved problems |
hostkit_env_get / hostkit_env_set | Read/write environment variables |
hostkit_db_schema | Inspect database table structure |
hostkit_db_query | Execute SQL queries (read or write mode) |
hostkit_validate | Check project configuration for common issues |
hostkit_state | VPS resource usage (CPU, memory, disk) |
hostkit_events | Deployment and service event history |
hostkit_auth_guide | Auth integration examples for the project |
A Real Debugging Session
Here's what debugging a 502 error looks like now:
Me: "The gilded-tiers site is returning a 502."
The agent calls hostkit_status(project="gilded-tiers") and gets back:
- Health: failing
- Service: stopped
- Last log lines:
Error: Cannot find module '.prisma/client'
Then calls hostkit_fix(error="502 Bad Gateway", project="gilded-tiers") which:
- Identifies the root cause: Prisma client wasn't copied to the standalone build
- Suggests the fix: copy
node_modules/.prismainto.next/standalone/node_modules/ - Optionally applies the fix and redeploys
The whole interaction takes 30 seconds. The equivalent SSH workflow takes 5 minutes if I remember where everything is, longer if I don't.
The hostkit_fix Pattern
This is the tool I'm most proud of. It takes an error description and a project name, then:
- Checks the project status (health, services, recent logs)
- Matches the error pattern against known solutions
- Suggests a fix with a confidence level
- Can apply the fix automatically if approved
The known solutions database (hostkit_solutions) is cross-project. If I solve a "502 due to missing Prisma client" on one project, the solution is available for every project. The knowledge compounds.
Common patterns it handles:
- 502 Bad Gateway — process crashed, port mismatch, Nginx misconfigured
- Auth failures — expired JWT key, missing
AUTH_URL, cookie domain mismatch - Database errors — connection refused, pool exhaustion, missing migration
- Build failures — missing dependencies in standalone output, wrong Node version
- SSL issues — cert expired, wrong domain in Nginx config
Database Queries Without psql
One of the most common ops tasks is "look up a value in the production database." Before MCP tools, that meant SSH-ing into the server and opening a psql session:
ssh my-server
sudo -u myapp psql myapp_db
SELECT * FROM "User" WHERE email = '[email protected]';
Now I just ask in plain English:
Me: "How many active subscriptions does gilded-tiers have?"
The agent calls hostkit_db_query(project="gilded-tiers", query="SELECT count(*) FROM \"ProjectService\" WHERE status = 'active'") and returns the answer.
For schema exploration, hostkit_db_schema returns table structures without memorizing column names. For write operations, the query tool has an explicit write_mode parameter that must be set intentionally — no accidental UPDATEs.
Environment Variable Management
Env var debugging is the most common ops task that isn't technically "debugging." Did I set the Stripe webhook secret? Is the base URL correct? What's the S3 bucket name?
hostkit_env_get(project="emergent")
→ Returns all env vars (secrets redacted)
hostkit_env_set(project="emergent", key="NEXT_PUBLIC_BASE_URL", value="https://emergentaiagency.com")
→ Sets the var and optionally restarts the service
No SSH. No editing .env files. No forgetting to restart the service after changing a variable.
The Compound Effect
The value of MCP tools isn't any single tool. It's the conversation-level context that ties them together. When I'm debugging, the agent remembers what it already checked. It correlates the health check failure with the log output with the missing env var. It doesn't lose context between steps like I do when I'm switching terminal windows.
And because every tool interaction is logged in the conversation, I have an automatic audit trail of what was checked, what was changed, and why. No more "I think I restarted that service but I'm not sure."
Building Your Own MCP Tools
If you're running any production infrastructure, the pattern is worth adopting. Start with three tools:
- Status check — health, logs, service state in one call
- Execute — run arbitrary commands with output capture
- Env management — read and write environment variables
These three cover 80% of ops tasks. Build specialized tools (database queries, auto-fix, deploy) as the patterns emerge from your actual debugging sessions.
The best DevOps interface is the one that matches how you think about problems — in natural language, with context, without remembering which directory you need to be in.
