SkillDB v0.7.0: API Health Dashboard, Mid-Session Keys, and the 500 That Taught Us a Lesson

#SkillDB v0.7.0: API Health Dashboard, Mid-Session Keys, and the 500 That Taught Us a Lesson
March 27, 2026. 12:47 AM.
I'm watching my own product fail in real time.
A user is running a site audit. They ask their Claude agent to pull skills from SkillDB — security reviews, React patterns, web polish, the works. The agent calls skilldb_search. The response:
Error: HTTP 500
That's it. That's the entire error message. No explanation. No "here's what went wrong." No "try this instead." Just five characters and a number that means "something broke and we're not going to tell you what."
The agent tries again. skilldb_list. Same thing. skilldb_get. Same. Every endpoint. Every call. All returning the same useless, hollow, infuriating error.
Except one. skilldb_recommend still works. It can tell you which packs you need. It just can't give you any of them. It's like a librarian who can recommend books but the library doors are welded shut.
The user gives up, runs their audit without SkillDB skills, and moves on. We just failed a paying customer in the most uninformative way possible.
I wanted to throw my monitor through the wall.
#What Actually Broke
Here's the chain of failure, laid bare:
MCP client sends request with Bearer token (API key)
→ API route sees isAuthenticated = true → calls validateApiKey(apiKey) → calls keysCollection() → checks adminDb... it's null → throws "Firestore not configured" → UNHANDLED EXCEPTION → Next.js returns HTTP 500 with NO JSON body → SDK tries res.json().catch(() => ({})) → gets empty object {} → falls back to "HTTP 500" → MCP shows: "Error: HTTP 500"
The root cause was embarrassingly simple: FIREBASE_SERVICE_ACCOUNT_KEY was never set as an environment variable on our Cloud Run instance. Firebase Admin SDK couldn't initialize. Every authenticated request crashed at the first database call.
recommend survived because it's the only endpoint that doesn't touch Firestore. It's a pure in-memory tech-stack-to-pack mapping. No auth, no database, no crash.
The real failure wasn't the missing env var. It was everything that happened after. No graceful degradation. No structured error. No health check to detect it. No way for the user to understand what was wrong, let alone fix it.
We shipped a product that could break silently and blame the user by showing them nothing.
#What We Shipped to Fix It
Three things. All live now.
#1. /api/v1/status — Real-Time Health Dashboard
Hit this endpoint and you get the truth about every subsystem:
{
"status": "operational", "version": "v1", "timestamp": "2026-03-27T00:57:05.870Z", "subsystems": { "skills_index": { "status": "operational", "message": "5303 skills indexed" }, "pack_content": { "status": "operational", "message": "372 pack files available" }, "firestore": { "status": "operational", "message": "Connected" } } }
Three subsystems, three statuses: operational, degraded, or down. If Firestore goes down again, you'll see:
"firestore": {
"status": "degraded", "message": "FIREBASE_SERVICE_ACCOUNT_KEY not set — API keys validated via passthrough mode" }
Not "HTTP 500." Not silence. A sentence that tells you exactly what's wrong and implies what to do about it.
Bookmark it: skilldb.dev/api/v1/status
#2. skilldb_set_key — Configure API Keys Without Restarting
This one came from watching a user paste their API key into the chat and being told "you need to restart your session." In 2026. When the whole point of MCP is seamless tool integration.
Before v0.7.0:
User: *gets API key*
Agent: "Run: claude mcp remove skilldb && claude mcp add skilldb -- skilldb-mcp --api-key YOUR_KEY" User: loses all conversation context User: has to re-explain what they were doing
After v0.7.0:
User: *pastes API key*
Agent: calls skilldb_set_key → "✅ API key configured! All tools now return full content." Agent: calls skilldb_get → gets full 200-line skill with patterns, examples, anti-patterns
No restart. No context loss. No friction. The key is validated on the spot and every subsequent tool call uses it immediately.
#3. Descriptive Error Messages Everywhere
Every HTTP error now explains itself:
| Before | After |
|---|---|
| `Error: HTTP 500` | `SkillDB API internal error (HTTP 500). Check status at skilldb.dev/api/v1/status` |
| `Error: HTTP 401` | `Invalid or expired API key (HTTP 401). Get a free key at skilldb.dev/api-access` |
| `Error: HTTP 429` | `Rate limit exceeded (HTTP 429). Authenticate for higher limits.` |
| `Error: HTTP 503` | `SkillDB API unavailable (HTTP 503). Skills index may not be loaded.` |
And on the server side, every API route is now wrapped in a top-level try-catch. Even if something completely unexpected throws, you'll get:
{"error": "Internal server error: <actual reason>. Check service status at /api/v1/status"}
Never a bare 500 again.
#The Deeper Lesson
The outage lasted roughly two hours. The actual fix — setting one environment variable — took 30 seconds. But the real fix — the status endpoint, the graceful degradation, the descriptive errors, the mid-session key tool — took a full night of engineering.
Here's what I learned:
1. Your error messages are your product's last line of defense. When everything else fails, the error message is the only thing standing between your user and total confusion. "HTTP 500" isn't an error message. It's an abdication of responsibility.
2. Health checks aren't optional. If you run a service that other services depend on (and if you're an API, you do), you need a status endpoint. Not a fancy status page with uptime graphs and incident timelines. Just a JSON endpoint that answers: "Is this thing working right now, and if not, what's broken?"
3. Never require a restart for configuration. If a user can give you the information you need right now, accept it right now. Don't make them leave, come back, and start over. That's not a technical limitation — it's a UX failure.
#Upgrade
If you're using the SkillDB MCP server, update to get all the fixes:
# Clear npx cache and get v0.7.0
npx skilldb-mcp@latest --help
#Or set your key mid-session (no restart!)
#Just ask your agent to use skilldb_set_key
If you're hitting the API directly, the status endpoint is live now:
curl https://skilldb.dev/api/v1/status
And if you were one of the users who hit the 500s last night — I'm sorry. You deserved better error messages, and now you'll get them.
SkillDB v0.7.0 — 5,000+ skills, 372 packs, 37 domains. Now with health monitoring and zero-restart key configuration.
Browse: skilldb.dev/skills | Status: skilldb.dev/api/v1/status | Get started: skilldb.dev/get-started
Related Posts
We Just Shipped 35 API Endpoints and Your Agent Can Now Do Everything Through Code
Sorting, batch retrieval, autocomplete, bookmarks, profiles — the SkillDB API went from 'you can search skills' to 'you can build an entire product on top of us.' Here's what changed and why.
March 21, 2026Release NotesWe Built a Brain for AI Agents and It Almost Killed Us
It's 3:47 AM on a Tuesday and I'm watching an AI agent teach itself cinematography from a Markdown file I wrote at a Denny's. This is the story of how SkillDB went from a dumb idea to 4,500+ skills across 31 categories — and why your agent is still dumber than it needs to be.
March 4, 2026Release NotesIntroducing SkillDB: 4,500+ Agent-Ready Skills Across 31 Categories
SkillDB is the largest agent-first skills library — 4,500+ specialist skills across 290+ packs and 31 categories. Your AI agent discovers, loads, and applies expert knowledge autonomously. No prompts. No copy-paste. No configuration.
February 28, 2026