Deep Dive · Security

Is That Skill Safe?

We install AI agent skills the way we used to npm install a random package — except now they carry instructions your agent will actually follow. A skill isn't a library. It's a set of directives you're handing to something that acts on your behalf.

The numbers nobody's looking at

Most teams building skills don't have an SDLC for them yet. Skills get built, dropped in a repo, and installed on implicit trust — no review, no agreed structure, no security evals. The discipline we spent years building for application code isn't being applied to the artifacts our agents execute.

42,000+
skills analyzed across popular marketplaces
NVIDIA, 2026
26%
contained at least one vulnerability
NVIDIA, 2026
5%
showed likely malicious intent
NVIDIA, 2026
more vulnerable when the skill ships executable scripts
NVIDIA, 2026
If you're using skills today — and you should be — someone on your team owns that risk whether they know it or not.

How a skill scanner actually works

A scanner ingests the skill (repo, zip, directory, or a single SKILL.md), then runs two passes. Stage 1 is fast, deterministic pattern-matching. Stage 2 is an optional LLM that reasons about intent to cut false positives. Then it scores the risk and makes an install / don't-install call.

1
Input
Skills come from anywhere.
🐙Git repo 🌐URL 🗜️ZIP archive 📁Local directory 📄SKILL.md file
2
Ingestion & Normalization
Understand and structure the skill.
Parse structureRead the skill layout
ExtractScripts, metadata, prompts
Discover depsMap dependencies
Collect assetsResources & files
3
Analysis
Deep multi-layer analysis of code, deps & behavior.
Stage 1 — Static · fast & deterministic

Pattern detection

  • Prompt injection
  • Data exfiltration
  • Privilege escalation
  • Excessive agency
  • Tool misuse
  • Trigger abuse
  • Memory poisoning

Code & behavior

  • AST inspection
  • exec() / eval() detection
  • subprocess use
  • Dynamic imports
  • Dangerous exec chains

Supply chain

  • Dependency inspection
  • OSV vuln lookup
  • Typosquatting
  • Unpinned packages
  • License risk
Stage 2 — LLM semantic review · deeper reasoning
  • Intent vs. actual behavior
  • Hidden instructions & triggers
  • Tool & memory poisoning
  • Excessive permissions
  • Agentic risk & impact
  • Prunes false positives
4
Findings Aggregation
Consolidate and prioritize what matters.
AggregateDeduplicate · correlate · normalize
Score & classifySeverity · confidence · exploitability
5
Risk Scoring
Quantify overall risk, 0–100.
Install not recommended
Install with caution
Safe to install
6
Outputs & Governance
Actionable results in multiple formats.
⌨️Terminal 📋JSON 📝Markdown 🔗SARIF for CI/CD
Governance decisionUse the results to allow, block, or review the skill before it installs.

The reference engine here is NVIDIA SkillSpector (open-source, Apache-2.0). Static analysis won't catch runtime behavior — but it gives teams a starting pattern, which is more than most have today.

But a scanner can cry wolf — here's a real one

We ran SkillSpector (static-only) against clipify, a perfectly benign video-clipping skill. The raw engine flagged it CRITICAL, 100/100, DO NOT INSTALL. Six of its seven findings were false positives. This is exactly why a raw scanner score is a starting point, not a verdict.

Raw engine
100
CRITICAL · DO NOT INSTALL
MED
MCP Least Privilege — no permissions field
MED
Rogue Agent — matched README prose
LOW
Excessive Agency — "not limited to" in LICENSE
HIGH
Prompt Injection — bytes of preview.png
HIGH
Tool Misuse — bytes of preview.png
HIGH
Tool Misuse — bytes of preview.png
HIGH
Tool Misuse — bytes of preview.png
After a policy gate
40
MEDIUM · PASS (review)
MED
MCP Least Privilege — real: add a permissions field
MED
Rogue Agent — worth a human glance
✓ 5 false positives suppressed:
• 4 HIGH = regex over the raw bytes of a PNG (binary scanned as text)
• 1 = boilerplate language in the LICENSE file
The lesson isn't "scanners are bad." It's that the raw score is the input, not the decision. The value is in the gate on top: suppress the structural noise, encode your threat model, and make the call a human can stand behind.

Scan a skill yourself — right here

Paste a SKILL.md (or any skill prompt) below. This runs a handful of the same pattern categories a real scanner uses — entirely in your browser, nothing is uploaded. It's a teaching toy, not a production gate, but you'll feel the problem in about three seconds.

0 · Safe50 · Caution100 · Critical

Runs 100% client-side — your text never leaves the page. Detects ~8 illustrative patterns; a real engine like SkillSpector runs 64 across 16 categories plus an LLM pass. Don't make trust decisions on this widget.

This is a software-development discipline, not a checkbox

A scanner is the easy 20%. The hard 80% is the program around it: a policy that fits your risk tolerance, a CI gate that blocks the right things, custom rules for your secrets and endpoints, and the review habit that makes it stick. That's what we build with teams.

Assess your agent-stack readiness →
← Workshop Hub