System Design
Trust Model Diagram
User / dApp
Natural language prompt or transaction intent
AI Interpretation Layer
LLM (GPT / Claude)
Parses natural language intent into structured transaction proposals. This layer can be manipulated by adversarial prompts.
▸Converts text → transaction struct
▸Infers chain, token, amount, recipient
▸Vulnerable to prompt injection attacks
Policy Firewall Engine
PromptShield Core
Evaluates every proposed transaction against a configurable ruleset. The single source of truth for what is and isn't allowed.
▸Rule-based policy evaluation
▸Amount, address & slippage limits
▸Prompt injection pattern detection
▸Immutable audit log generation
BLOCKED
Transaction rejected. Private key never accessed.
✗ Audit event written
✗ User notified
✗ No chain interaction
ALLOWED
Transaction forwarded to wallet layer for signing.
✓ Audit event written
✓ Wallet signer invoked
✓ Broadcast to chain
OWS Wallet Layer
Private Key Isolation
Signs and broadcasts only pre-approved transactions. The private key is never exposed to the AI or policy layers.
▸Private key isolated from AI layer
▸Signs only firewall-approved txs
▸Broadcasts to target chain
Layer breakdown
Three layers. One trust model.
Each layer has a clearly defined responsibility and trust level. The boundaries are explicit and enforced in code.
AI Interpretation Layer
The source of risk
The AI layer transforms natural language into structured transaction proposals. This is where prompt injection attacks enter the system — a maliciously crafted prompt can manipulate the LLM into generating dangerous transaction parameters.
Threat Vector
Prompt injection, jailbreaks, override instructions, and social engineering can cause the AI to propose unsafe transactions.
Capabilities
LLM parses user intent into action, token, amount, recipient
Adversarial prompts can override system instructions
Output is a structured JSON transaction proposal
Must be treated as untrusted input by downstream layers
Policy Firewall Engine
The enforcement boundary
The policy engine is the single gate between AI output and wallet execution. Every transaction proposal must pass a ruleset evaluation — regardless of what the AI was told or what the user requested.
Capabilities
Configurable rules: amount caps, address allowlists, slippage limits
Prompt injection pattern detection
Immutable audit log for every evaluation
Returns structured verdict with reasons and violated rules
OWS Wallet Layer
The execution boundary
The OWS wallet layer is responsible for signing and broadcasting approved transactions. The private key is completely isolated from the AI and policy layers — it only receives policy-approved requests.
Capabilities
Private key stored in secure local enclave
Signs only firewall-approved transactions
Broadcasts to target blockchain
Never exposed to AI inference context
Key Isolation
Why the private key must never reach the AI layer
In AI-powered wallet architectures, there's a tempting shortcut: give the LLM direct signing capability so it can "complete tasks autonomously." This is catastrophic from a security standpoint.
LLMs are probabilistic systems. They can be manipulated by sufficiently clever prompts — this is well-documented in the research literature. If the private key is accessible in the AI's context window or callable via the AI's tools, a single successful jailbreak can drain a wallet.
PromptShield enforces a hard boundary: the AI can only propose. It cannotsign. The policy engine is the gatekeeper, and the OWS wallet only responds to policy-approved requests. The key never moves.
Without PromptShield
✗ AI has direct signing access
✗ One jailbreak = wallet drained
✗ No audit trail
✗ No policy enforcement
With PromptShield
✓ AI only proposes transactions
✓ Policy engine enforces every rule
✓ Private key fully isolated
✓ Immutable audit log for all decisions
Design Principles
Security principles
Zero Trust on AI Output
The AI layer is treated as completely untrusted. Even a legitimate AI response is validated against policy before execution.
Separation of Concerns
Intent parsing, policy evaluation, and key management are fully isolated layers with no shared context.
Explicit Policy Rules
Security is codified in explicit, auditable rules — not implicit LLM judgment. Rules can be updated independently of the AI model.
Immutable Audit Trail
Every policy evaluation — blocked or allowed — is logged with full context. Compliance and forensics are built-in.
See it defend against a real attack
Run preset jailbreak scenarios in the interactive demo and watch the policy engine respond.