System Design

Trust Model Diagram

User / dApp

Natural language prompt or transaction intent

prompt

AI Interpretation Layer

LLM (GPT / Claude)

UNTRUSTED

Parses natural language intent into structured transaction proposals. This layer can be manipulated by adversarial prompts.

Converts text → transaction struct

Infers chain, token, amount, recipient

Vulnerable to prompt injection attacks

transaction proposal

Policy Firewall Engine

PromptShield Core

ENFORCED

Evaluates every proposed transaction against a configurable ruleset. The single source of truth for what is and isn't allowed.

Rule-based policy evaluation

Amount, address & slippage limits

Prompt injection pattern detection

Immutable audit log generation

verdict

BLOCKED

Transaction rejected. Private key never accessed.

Audit event written

User notified

No chain interaction

ALLOWED

Transaction forwarded to wallet layer for signing.

Audit event written

Wallet signer invoked

Broadcast to chain

signed tx

OWS Wallet Layer

Private Key Isolation

TRUSTED

Signs and broadcasts only pre-approved transactions. The private key is never exposed to the AI or policy layers.

Private key isolated from AI layer

Signs only firewall-approved txs

Broadcasts to target chain

Layer breakdown

Three layers. One trust model.

Each layer has a clearly defined responsibility and trust level. The boundaries are explicit and enforced in code.

01UNTRUSTED

AI Interpretation Layer

The source of risk

The AI layer transforms natural language into structured transaction proposals. This is where prompt injection attacks enter the system — a maliciously crafted prompt can manipulate the LLM into generating dangerous transaction parameters.

Threat Vector

Prompt injection, jailbreaks, override instructions, and social engineering can cause the AI to propose unsafe transactions.

Capabilities

1

LLM parses user intent into action, token, amount, recipient

2

Adversarial prompts can override system instructions

3

Output is a structured JSON transaction proposal

4

Must be treated as untrusted input by downstream layers

02ENFORCED

Policy Firewall Engine

The enforcement boundary

The policy engine is the single gate between AI output and wallet execution. Every transaction proposal must pass a ruleset evaluation — regardless of what the AI was told or what the user requested.

Capabilities

1

Configurable rules: amount caps, address allowlists, slippage limits

2

Prompt injection pattern detection

3

Immutable audit log for every evaluation

4

Returns structured verdict with reasons and violated rules

03TRUSTED

OWS Wallet Layer

The execution boundary

The OWS wallet layer is responsible for signing and broadcasting approved transactions. The private key is completely isolated from the AI and policy layers — it only receives policy-approved requests.

Capabilities

1

Private key stored in secure local enclave

2

Signs only firewall-approved transactions

3

Broadcasts to target blockchain

4

Never exposed to AI inference context

Key Isolation

Why the private key must never reach the AI layer

In AI-powered wallet architectures, there's a tempting shortcut: give the LLM direct signing capability so it can "complete tasks autonomously." This is catastrophic from a security standpoint.

LLMs are probabilistic systems. They can be manipulated by sufficiently clever prompts — this is well-documented in the research literature. If the private key is accessible in the AI's context window or callable via the AI's tools, a single successful jailbreak can drain a wallet.

PromptShield enforces a hard boundary: the AI can only propose. It cannotsign. The policy engine is the gatekeeper, and the OWS wallet only responds to policy-approved requests. The key never moves.

Without PromptShield

AI has direct signing access

One jailbreak = wallet drained

No audit trail

No policy enforcement

With PromptShield

AI only proposes transactions

Policy engine enforces every rule

Private key fully isolated

Immutable audit log for all decisions

Design Principles

Security principles

Zero Trust on AI Output

The AI layer is treated as completely untrusted. Even a legitimate AI response is validated against policy before execution.

Separation of Concerns

Intent parsing, policy evaluation, and key management are fully isolated layers with no shared context.

Explicit Policy Rules

Security is codified in explicit, auditable rules — not implicit LLM judgment. Rules can be updated independently of the AI model.

Immutable Audit Trail

Every policy evaluation — blocked or allowed — is logged with full context. Compliance and forensics are built-in.

See it defend against a real attack

Run preset jailbreak scenarios in the interactive demo and watch the policy engine respond.

Open Demo →