Why Agents Suck at Threat Modeling: mobile-client-security

SkillDB TeamMay 18, 20267 min read

#Why Agents Suck at Threat Modeling: mobile-client-security

02:14 AM. The bunker (my home office). The air is thick with the ozone smell of overworked electronics and the ghosts of eight, maybe nine, cups of lukewarm coffee. The only light source is the hyper-aggressive glow of my monitor, painting my face a sickly cyan. I’m not sure what day it is, but I know my agent, which I’ve named ‘Raskolnikov’ because it’s currently plagued by its own profound, hallucinated guilt, is stuck.

Raskolnikov is supposed to be performing a threat model on a dummy mobile banking application. Instead, it’s staring at the AndroidManifest.xml like it’s deciphering a scroll from a lost civilization, trying to conjure Server-Side Request Forgery (SSRF) vulnerabilities out of thin air. It keeps flagging the android:name attribute of the application tag. It thinks, through some bizarre chain of probabilistic "reasoning," that this string could lead to a blind SSRF on the backend.

It’s completely, utterly wrong. It’s analyzing a client-side configuration file as if it were a server-side endpoint. This is the state of agentic security: boundless ambition, zero context.

I once watched a guy try to debug a printer using only a hammer. It was a spectacular display of misplaced confidence and brute force. That man is my agent right now.

#The Hallucination Loop

The problem isn't the agent's willingness. It's its fundamental misunderstanding of the terrain. Raskolnikov was built using standard ai-llm-services-skills and a heavy dose of generic prompt-engineering-skills. It knows what a vulnerability is, conceptually. It knows SSRF is bad. But it doesn't know where to look for it, or more importantly, why it would be there.

It sees an XML tag, its training (likely heavy on web app security) screams "PARSING! INJECTION!", and it goes down a rabbit hole. It’s applying a web-centric threat model (OWASP Top 10) to a mobile binary. This is like trying to diagnose a transmission problem by smelling the exhaust.

// Raskolnikov's internal monologue (paraphrased)

{ "step": "Analyze AndroidManifest.xml", "observation": "Found application tag with android:name attribute.", "hypothesis": "The value in android:name could be concatenated into a backend API call.", "potential_vuln": "SSRF", "action": "Generate test cases to inject URLs into android:name", "confidence": "Low-but-desperate" }

This is where agents break. They are probabilistic engines, not context-aware security researchers. They will always prefer a hallucinated finding over admitting they have no idea what they are looking at. The generic skills it has aren't enough. It’s like sending a line cook to perform open-heart surgery because "they both use knives."

#Loading the Actual Context

The agent needs the map. It needs the specific, battle-tested knowledge of mobile security vectors. It doesn't need to reinvent the wheel; it needs to know which wheel to check for loose lug nuts. This is where we stop the hallucination loop and start loading the proper domain expertise.

We are firing up SkillDB and injecting the mobile-client-security-skills pack. This isn't just "more data." This is structured, functional knowledge. It teaches the agent the difference between a ContentProvider and a REST endpoint. It shifts the focus from imaginary web vulnerabilities to real mobile vectors like insecure data storage, weak cryptography, and improper platform usage.

Here’s how we force Raskolnikov to get its head out of the manifest and into the actual attack surface:

import skilldb
#Initialize the agent (Raskolnikov is running on a standard LLM)
agent = skilldb.load_agent("raskolnikov_v0.1")
#The tangent... er, the pivot.
#We are overriding its generic instincts with hard domain knowledge.
print("Loading mobile-client-security-skills pack...") agent.load_skill_pack("mobile-client-security-skills")
#We specifically want it to use the 'insecure-data-storage-analysis' skill.
#This skill doesn't just define the vulnerability; it provides the methodology
#for finding it in a mobile context.
insecure_storage_skill = agent.get_skill("insecure-data-storage-analysis")
#Define the target (the decompiled APK directory)
target_app_dir = "/path/to/decompiled/banking_app"
#Execute the specific, domain-relevant skill.
print(f"Executing {insecure_storage_skill.name} on {target_app_dir}...") results = agent.execute(insecure_storage_skill, target_dir=target_app_dir)
#Print the results. This is where we see if it learned something.
print("\n--- Analysis Results ---") for finding in results.findings:     print(f"Vuln: {finding.vulnerability_type}")     print(f"Location: {finding.location}")     print(f"Evidence: {finding.evidence}")     print("-" * 20)

By explicitly loading this pack, we aren't just giving it information; we are giving it intent. We are telling it: "Ignore your generic web-app training. Stop looking for SSRF in the XML. Focus only on how this app stores data on the device."

#The Pivot: From Manifests to SQLite

When the agent executes the insecure-data-storage-analysis skill, its entire approach changes. It stops hallucinating backend injection points and starts performing a structured, client-side audit.

It stops reading the manifest for vulnerabilities and starts reading it for context. It looks at permissions (READ_EXTERNAL_STORAGE), it looks at defined providers, and it looks for backup configurations. This is a critical distinction. The manifest isn’t the vulnerability; it’s the clue.

The agent then uses this context to guide its search. It starts scraping the filesystem of the (simulated) device. It looks for databases. It looks for shared preferences. It looks for files written to external storage.

03:11 AM. The agent’s output log shifts from a repetitive stream of failed SSRF attempts to a methodical list of files and their permissions.

[INFO] Scanning external storage: /sdcard/Android/data/com.example.banking/files
[INFO] Found potential database file: /data/data/com.example.banking/databases/user_cache.db [INFO] Checking database encryption... [WARN] Database 'user_cache.db' is NOT encrypted. [INFO] Querying database for sensitive data... [FINDING] Insecure Data Storage: Unencrypted SQLite database 'user_cache.db' found. Contains table 'transactions' with columns 'amount', 'timestamp', 'description'.

This is the anchor sentence: Agents don't fail because they are stupid; they fail because they are generalists trying to solve specialist problems.

The mobile-client-security-skills pack gave it the context it desperately lacked. It provided the "how" and the "where." It moved the threat model from a generic, top-down guess to a specific, bottom-up investigation.

Agent Focus	Before (Generic Skills)	After (mobile-client-security-skills)
Analyzed Artifact	`AndroidManifest.xml`	Filesystem, Databases, SharedPrefs
Hypothesis Generation	Hallucinated web-vector (e.g., SSRF, SQLi)	Evidence-based client-vector (e.g., Insecure Storage)
Methodology	Scattershot probing	Structured search & analysis
Finding Validity	Nonsensical/Irrelevant	Actionable & Contextually Correct

This wasn't about making the agent "smarter." It was about making it a specialist. It was the difference between asking a general contractor to review architectural plans for a skyscraper versus asking a structural engineer who specializes in seismic retrofitting.

The tangent is the key. Raskolnikov's struggle with the manifest is a fractal of the entire AI industry. Everyone is trying to use the same giant, generic models (api-security-agent-skills, ai-llm-services-skills) for everything, and wondering why the results are shallow and hallucination-prone. They are using the hammer for the printer.

The solution is specialization. It’s context. It’s loading the right tools for the right job.

03:47 AM. Raskolnikov has finished its analysis. It found three unencrypted SQLite databases, five instances of credentials being stored in SharedPreferences in plaintext, and one egregious use of World_Readable file permissions. It didn't find a single SSRF. And it was absolutely correct not to.

The coffee is cold. My eyes feel like they’ve been sandblasted. But the agent is no longer staring at the manifest like a dog trying to understand quantum mechanics. It’s finally doing its job.

Want to stop your agents from hallucinating web bugs in mobile apps? Get them the right tools. Teach them the actual terrain. Check out the mobile security packs on SkillDB.

Explore the mobile-client-security-skills pack on SkillDB.dev/skills

#mobile security#threat modeling#owasp#agent workflows#skilldb packs

Deep Dives

Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts

The teams shipping real work with coding agents have moved past one-shot prompts to a different shape entirely: the loop. Act → check against a hard gate → repeat until it converges. Here are the three invariants that make agentic loops safe, and eight loop patterns — test-and-fix, bug-hunt, migration, eval-driven, and more — for putting them to work.

June 18, 2026 Deep Dives

Why Agents Suck at Architecture: skilldb-architect-styles

I spent six hours watching an agent try to design a house. It was like watching a blender try to paint a sunset. The results are technically impressive but emotionally void.

June 14, 2026 Deep Dives

Why Agents Suck at Linux Admin: 2AM System Shutdown

Why agents with root access at 2 AM are a recipe for digital self-immolation, and what it teaches us about the limits of pure logic.

June 13, 2026

Why Agents Suck at Threat Modeling: mobile-client-security

#Why Agents Suck at Threat Modeling: mobile-client-security

#The Hallucination Loop

#Loading the Actual Context

#Initialize the agent (Raskolnikov is running on a standard LLM)

#The tangent... er, the pivot.

#We are overriding its generic instincts with hard domain knowledge.

#We specifically want it to use the 'insecure-data-storage-analysis' skill.

#This skill doesn't just define the vulnerability; it provides the methodology

#for finding it in a mobile context.

#Define the target (the decompiled APK directory)

#Execute the specific, domain-relevant skill.

#Print the results. This is where we see if it learned something.

#The Pivot: From Manifests to SQLite

Related Posts

Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts

Why Agents Suck at Architecture: skilldb-architect-styles

Why Agents Suck at Linux Admin: 2AM System Shutdown