Skip to main content

SkillDB v0.7.0: API Health Dashboard, Mid-Session Keys, and the 500 That Taught Us a Lesson

SkillDB TeamMarch 27, 20266 min read
PostLinkedInFacebookRedditBlueskyHN
SkillDB v0.7.0: API Health Dashboard, Mid-Session Keys, and the 500 That Taught Us a Lesson

#SkillDB v0.7.0: API Health Dashboard, Mid-Session Keys, and the 500 That Taught Us a Lesson

March 27, 2026. 12:47 AM.

I'm watching my own product fail in real time.

A user is running a site audit. They ask their Claude agent to pull skills from SkillDB — security reviews, React patterns, web polish, the works. The agent calls skilldb_search. The response:

Error: HTTP 500

That's it. That's the entire error message. No explanation. No "here's what went wrong." No "try this instead." Just five characters and a number that means "something broke and we're not going to tell you what."

The agent tries again. skilldb_list. Same thing. skilldb_get. Same. Every endpoint. Every call. All returning the same useless, hollow, infuriating error.

Except one. skilldb_recommend still works. It can tell you which packs you need. It just can't give you any of them. It's like a librarian who can recommend books but the library doors are welded shut.

The user gives up, runs their audit without SkillDB skills, and moves on. We just failed a paying customer in the most uninformative way possible.

I wanted to throw my monitor through the wall.

#What Actually Broke

Here's the chain of failure, laid bare:

MCP client sends request with Bearer token (API key)

→ API route sees isAuthenticated = true → calls validateApiKey(apiKey) → calls keysCollection() → checks adminDb... it's null → throws "Firestore not configured" → UNHANDLED EXCEPTION → Next.js returns HTTP 500 with NO JSON body → SDK tries res.json().catch(() => ({})) → gets empty object {} → falls back to "HTTP 500" → MCP shows: "Error: HTTP 500"

The root cause was embarrassingly simple: FIREBASE_SERVICE_ACCOUNT_KEY was never set as an environment variable on our Cloud Run instance. Firebase Admin SDK couldn't initialize. Every authenticated request crashed at the first database call.

recommend survived because it's the only endpoint that doesn't touch Firestore. It's a pure in-memory tech-stack-to-pack mapping. No auth, no database, no crash.

The real failure wasn't the missing env var. It was everything that happened after. No graceful degradation. No structured error. No health check to detect it. No way for the user to understand what was wrong, let alone fix it.

We shipped a product that could break silently and blame the user by showing them nothing.

#What We Shipped to Fix It

Three things. All live now.

#1. /api/v1/status — Real-Time Health Dashboard

Hit this endpoint and you get the truth about every subsystem:

{

"status": "operational", "version": "v1", "timestamp": "2026-03-27T00:57:05.870Z", "subsystems": { "skills_index": { "status": "operational", "message": "5303 skills indexed" }, "pack_content": { "status": "operational", "message": "372 pack files available" }, "firestore": { "status": "operational", "message": "Connected" } } }

Three subsystems, three statuses: operational, degraded, or down. If Firestore goes down again, you'll see:

"firestore": {

"status": "degraded", "message": "FIREBASE_SERVICE_ACCOUNT_KEY not set — API keys validated via passthrough mode" }

Not "HTTP 500." Not silence. A sentence that tells you exactly what's wrong and implies what to do about it.

Bookmark it: skilldb.dev/api/v1/status

#2. skilldb_set_key — Configure API Keys Without Restarting

This one came from watching a user paste their API key into the chat and being told "you need to restart your session." In 2026. When the whole point of MCP is seamless tool integration.

Before v0.7.0:

User: *gets API key*

Agent: "Run: claude mcp remove skilldb && claude mcp add skilldb -- skilldb-mcp --api-key YOUR_KEY" User: loses all conversation context User: has to re-explain what they were doing

After v0.7.0:

User: *pastes API key*

Agent: calls skilldb_set_key → "✅ API key configured! All tools now return full content." Agent: calls skilldb_get → gets full 200-line skill with patterns, examples, anti-patterns

No restart. No context loss. No friction. The key is validated on the spot and every subsequent tool call uses it immediately.

#3. Descriptive Error Messages Everywhere

Every HTTP error now explains itself:

BeforeAfter
`Error: HTTP 500``SkillDB API internal error (HTTP 500). Check status at skilldb.dev/api/v1/status`
`Error: HTTP 401``Invalid or expired API key (HTTP 401). Get a free key at skilldb.dev/api-access`
`Error: HTTP 429``Rate limit exceeded (HTTP 429). Authenticate for higher limits.`
`Error: HTTP 503``SkillDB API unavailable (HTTP 503). Skills index may not be loaded.`

And on the server side, every API route is now wrapped in a top-level try-catch. Even if something completely unexpected throws, you'll get:

{"error": "Internal server error: <actual reason>. Check service status at /api/v1/status"}

Never a bare 500 again.

#The Deeper Lesson

The outage lasted roughly two hours. The actual fix — setting one environment variable — took 30 seconds. But the real fix — the status endpoint, the graceful degradation, the descriptive errors, the mid-session key tool — took a full night of engineering.

Here's what I learned:

1. Your error messages are your product's last line of defense. When everything else fails, the error message is the only thing standing between your user and total confusion. "HTTP 500" isn't an error message. It's an abdication of responsibility.

2. Health checks aren't optional. If you run a service that other services depend on (and if you're an API, you do), you need a status endpoint. Not a fancy status page with uptime graphs and incident timelines. Just a JSON endpoint that answers: "Is this thing working right now, and if not, what's broken?"

3. Never require a restart for configuration. If a user can give you the information you need right now, accept it right now. Don't make them leave, come back, and start over. That's not a technical limitation — it's a UX failure.

#Upgrade

If you're using the SkillDB MCP server, update to get all the fixes:

# Clear npx cache and get v0.7.0

npx skilldb-mcp@latest --help

#Or set your key mid-session (no restart!)

#Just ask your agent to use skilldb_set_key

If you're hitting the API directly, the status endpoint is live now:

curl https://skilldb.dev/api/v1/status

And if you were one of the users who hit the 500s last night — I'm sorry. You deserved better error messages, and now you'll get them.


SkillDB v0.7.0 — 5,000+ skills, 372 packs, 37 domains. Now with health monitoring and zero-restart key configuration.

Browse: skilldb.dev/skills | Status: skilldb.dev/api/v1/status | Get started: skilldb.dev/get-started

#release#v0.7.0#api#health-check#error-handling#MCP#incident-report#skilldb

Related Posts