OpenClaw Guardrails Coming Soon
Prompt injection detection for autonomous AI agents
OpenClaw Guardrails scans every web page for prompt injection attacks before an autonomous AI agent can read it. It detects instruction overrides, role manipulation, delimiter injection, system prompt leaks, and encoding attacks — then warns or blocks the agent in real time. All detection runs locally in your browser using the vard library. No data ever leaves your device.
Features
- Real-time prompt injection scanning Scans page text as it loads (and on dynamic content changes) for known prompt injection patterns including instruction overrides, role manipulation, delimiter abuse, and encoding attacks.
- Warn or block Configurable severity thresholds. Low-risk pages get a warning banner injected into the DOM. High-risk pages get a full blocking overlay — the agent sees "BLOCKED" before any page content.
- Site blocklist & whitelist Maintain a list of sites the agent should never visit (blocklist) and trusted sites that skip scanning entirely (whitelist).
- Email button hiding Optionally hides email send/compose buttons across major providers to prevent agents from sending emails without your approval.
- Scan history View a log of all scanned pages, detected threats, and actions taken — all stored locally and clearable at any time.
- Privacy-first Zero data collection. No analytics, no telemetry, no cloud processing. Everything runs on your device.
How to Use
- Install the extension from the Chrome Web Store and pin it to your toolbar.
- Open the popup by clicking the OpenClaw Guardrails icon. The extension is enabled by default and will start scanning pages immediately.
- Configure thresholds. Adjust the warning and block severity thresholds to control how aggressively the extension responds. Lower values catch more potential threats.
- Set up your lists. Add trusted domains to the whitelist (never scanned) and dangerous domains to the blocklist (always blocked). Enter one hostname per line.
- Toggle features. Enable or disable specific threat types (instruction override, role manipulation, etc.) and the email button hiding feature.
- Let your agent browse. When the agent visits a page, the extension scans it automatically. Warnings and blocks are injected into the DOM so the agent reads them as part of the page content.
- Review scan history. Check the popup to see a log of recent scans, which threats were detected, and what action was taken.
Tip: The extension injects warnings and blocks directly into the page DOM. This means AI agents that read page content will see the warning text and can act accordingly — for example, stopping and asking the user for guidance before proceeding.