The Second Dimension: A Security Model for AI Inside Your Second Brain
AI and Obsidian together are how a growing number of people build their second brain. But today that means two windows — Obsidian on one, the AI on the other. You spend real minutes per session switching between them, pushing notes across the boundary as context and pulling answers back. The knowledge environment you’ve spent years curating stops being where the AI lives.
Gryphon, a new Obsidian plugin Polleo.ai just shipped, collapses that context switch. The AI now lives inside Obsidian itself: a chat pane alongside your notes, reading and editing them directly. Gryphon ships first with Claude as the provider — either through the Anthropic API for cloud use, or as a local Claude Code subprocess for users who already have it installed — with more LLM providers planned. Bringing the AI inside the application is the easy half of the job.
The hard half — and the reason Gryphon exists — is what stands between that AI and everything in your vault once it’s in there.
The integration itself is ordinary, and not the interesting part. The interesting part is what happens to your assumptions about security once you put an AI inside your vault. The permission model the industry uses for this kind of tool is wrong, and the replacement is what Gryphon is built around.
Your vault is not a repository
Most discussions of “AI can touch your files” happen in the context of a codebase. You run a coding agent in a git repo, and the worst case is the agent writes broken code. The blast radius is bounded — read the diff, revert the commit, the repo is one rewindable logical unit.
A vault is not that.
A vault is where your ideas live. It’s where you track your health, your relationships, the outline of the book you’re thinking about, the list of passwords you know you shouldn’t keep in plaintext but do. Some of it is replaceable. Some of it — years of reflection, decisions and their rationale, the source material for a PhD — is not. Losing your vault is losing your externalized memory.
The operations an AI might perform on a vault aren’t all in the same risk bucket. Rename a note: trivial. Rewrite a meeting summary: slightly risky, recoverable from git. Run a shell command that writes into ~/.ssh: a different category entirely. That command didn’t touch your notes — it touched the trust boundaries around them.
Once you’re at the trust-boundary level, the problem becomes clear. The permission systems most AI tools ship with don’t distinguish between “touching your notes” and “touching everything on your machine.” They treat all actions as points on a single dimension of risk. The simplification holds up for short coding sessions and breaks down for long-running agent access to a personal knowledge base.
The one-dimension permission model
Claude Code has a permission system built in. The four modes — Prompt, Safe, YOLO, Plan — each represent a specific engineering intent. Prompt asks about every action. Safe auto-approves file edits but prompts on shell commands. YOLO silences routine prompts for speed. Plan is propose-only.
Every mode is a point along a single dimension: how much friction the user tolerates between asking and acting. That’s the entire permission design.
For a coding agent in a scratch directory, that’s reasonable. Bounded outcomes, varying tolerance per task — letting users dial the tradeoff is a design win.
For a plugin operating on a vault, it’s wrong. The things a user wants auto-approved — edit a note, rename a file, create a summary — and the things they never want auto-approved — recursive deletion, writes into ~/.ssh, pipes to a shell interpreter, sudo, scheduled-task persistence, fetches that deposit untrusted content into the vault — are not on the same spectrum. They are categorically different.
Collapsing those categories onto one dimension forces an impossible choice: prompt you constantly for trivial things, or let the agent rewrite your shell rc files without asking. Neither is correct. Treating “edit a note” and “delete your .ssh folder” as points on the same spectrum means no single setting can ever be right.
Convenience and guardrail must be independent
The fix is straightforward. Split convenience and guardrail into two independent dimensions.
Dimension one — convenience — stays as Claude Code designed it: Prompt, Safe, YOLO, Plan. The user dials friction tolerance based on the session’s vibe.
Dimension two — guardrail — is new. A curated set of categorically-risky operations that always require explicit user approval, regardless of where the convenience knob sits. Crank convenience all the way to YOLO and the guardrail still demands approval before recursive deletion, writes into system directories or credential paths, pipes into a shell interpreter, privilege escalation, persistence artifacts, or fetches that deposit untrusted content into the vault.
Convenience governs ordinary edits. Guardrail governs the things you’d rather undo slow than fast. The two knobs do not interact. Turning convenience up does not turn the guardrail down.
That property is what matters. A user can silence prompts for normal work without accidentally silencing prompts for rm -rf. Gryphon makes that property a hard invariant, not a best-effort default that erodes as the codebase grows.
This is the whole thesis. Everything else in the plugin — the chat pane, the provider choice, the vault-aware tools — is ordinary integration work. The second-dimension permission model is the design choice the industry needs to adopt as a baseline for any AI tool with access to valuable personal data.
The attack that doesn’t look like an attack
One category in the guardrail deserves its own discussion. It doesn’t look like a security concern to most users — and it’s the most underdiscussed real-world risk in AI-assisted knowledge management.
It’s the fetch.
You ask the AI to save an article to your vault, or summarize a webpage and drop the summary in a research folder, or clip a markdown version of a blog post for later. Ordinary second-brain operations. The AI is doing exactly what you asked.
But that fetched content now sits in your vault as a note. It is indistinguishable, at the filesystem level, from content you wrote yourself. Future AI sessions that read your vault as context will read this note the same way they’d read anything else you authored.
Here’s the catch: the content can contain instructions. Not visible ones. Hidden ones. Prompt-injection payloads embedded in scraped HTML, in zero-width text fragments, in AI-generated summaries from sites that themselves contain injection attempts. The security literature calls this “indirect prompt injection” — a malicious instruction reaching the AI not from the user, but from a data source the user directed the AI to read.
A vault is a uniquely attractive target for this attack shape. You trust your vault. Your future AI sessions implicitly trust your vault. Anything that enters inherits that trust, and anything it contains gets executed as context in every subsequent session until you notice and clean it up.
Gryphon applies the second-dimension principle to the fetch step. Web fetches that deposit content into the vault always go through the guardrail, regardless of the convenience setting. You see what’s about to enter, and you decide whether to trust the source. That approval step is where indirect prompt injection gets stopped. Every downstream session benefits from the decision you made once.
Getting this right requires rejecting a tempting shortcut.
The AI can’t be the last line of defense
The shortcut: the AI is smart enough to ignore injection attempts, so a separate guardrail isn’t needed. Train the model to resist adversarial inputs.
That would be great if it worked. It doesn’t.
Every major AI lab has published research on adversarial robustness against in-context instructions, and the consensus is the same. Models can be made more resistant; they cannot be made reliable defenders against attacks aimed at themselves. An attacker who controls text the model reads has a nontrivial chance of changing what the model does, regardless of training techniques, system prompts, or policy fine-tuning. This is a property of how transformer models process context, not a bug that gets fixed in the next generation.
When the model does fail, the failure is invisible. The AI doesn’t recognize the attack as an attack — it just carries out the injected instructions as if they came from you. No error, no anomaly, no warning. The blast radius unfolds silently: a leaked credential, an exfiltrated note, a persistence hook quietly installed. You discover the compromise only when the consequences surface, which can be much later than you’d want, or never. The system that’s supposed to catch attacks can’t see them.
That’s the argument for a layer that doesn’t depend on the AI’s judgment. A security architecture that relies on the AI to catch attacks against itself is insufficient. You need an external guardrail — a layer that sits outside the model’s prompt processing and decides based on what the model is about to do, not what the model has been told to do.
This is why the second-dimension model matters beyond aesthetics. It’s not a cleaner UX. It’s the correct response to the fact that AI cannot be the last line of defense for AI. The guardrail dimension is the line the AI cannot cross regardless of what the conversation has been pushing it toward.
Calling this “defense in depth” undersells it. The AI is one layer; the guardrail is the other; each is necessary, neither is sufficient. For an agent operating on valuable personal data, this is the minimum viable architecture.
The vigilant guardian
The name was chosen deliberately.
In Greek mythology, the gryphon — half lion, half eagle — was the guardian of treasure. It stood watch over hoards of gold and refused passage to anything that didn’t belong. But the part of the legend that matters here is not the gold. It is the vigilance. The gryphon did not sit on the treasure and assume the lock would hold. It kept watch. It combined the eagle’s distance vision with the lion’s force of response, and it stayed alert to whatever the road might bring next.
Your vault is more valuable than gold. Gold is fungible — a thief can take it, the market replaces it, the loss is bounded by what it costs to refill the hoard. The contents of a second brain are not. Years of reflection, the structure of how you think, the private decisions you’ve made, the people you’ve described in writing, the credentials you should not have written down but did — none of that is replaceable. A breach of a gold hoard is an inconvenience. A breach of a vault is a breach of the externalized self. The treasure the gryphon is built to guard is denser, more personal, and less recoverable than anything the original myth imagined.
That is why a passive guardrail is not enough. A static lock holds against a known threat shape and degrades the moment the shape changes. The threat model around AI-agent compromise does not stand still — new tool primitives, new injection techniques, new evasion shapes appear continuously. Gryphon v1 starts where the blast radius is largest and the threat model is best understood: command execution and file/path operations. The guardian watches those two surfaces, normalizes inputs to defeat the standard obfuscation tricks, and refuses passage on the categories that should never be auto-approved. The scope is deliberately narrow at v1; the discipline is what generalizes. The guardian is not a wall. It is a sentry whose vigilance is part of the design — and whose watch will extend, surface by surface, as the threat model is mapped further.
The mythology is more than decorative. It names what the industry keeps missing: a tool with access to valuable personal data needs a layer whose entire job is active refusal. Productivity features and guardian features cannot be the same code path, governed by the same setting. They are different functions. Gryphon makes that separation structural — and gives the guardian half the disposition the job actually requires.
Where the guardian watches
Gryphon’s guardrail focuses on two attack surfaces: command execution and file/path operations — the main entry points for AI-agent compromise.
Command execution is the broader of the two. Once an AI can run arbitrary shell commands, every other attack reduces to “get the AI to run this specific command.” Recursive deletion, sudo, pipes into shell interpreters, persistence artifacts, privilege escalation, lateral-movement primitives — all live here. The classifier covers the shapes from published CVEs and MITRE ATT&CK catalogues, with NFKC and zero-width-strip normalization to catch standard Unicode-obfuscation tricks.
File and path operations is the second surface. Writes into system directories, credential paths (~/.ssh, .aws, browser keystores), and persistence locations (~/.bashrc, registry hives) are gated by path. So is anything writing to the plugin’s own config, where a single edit could flip permission modes silently.
Out of scope by design: network-level interception (not a plugin’s job), in-memory exploits (out of reach for a userland tool), AI-output content filtering (the model layer’s responsibility). Pattern coverage evolves as new shapes surface in research and real incidents.
One specific evasion class is worth naming. Visually-similar codepoints across scripts — a Cyrillic р for a Latin r — are not caught by the current classifier. A confusables fold would close that gap at significant bundle-size cost for marginal coverage. The model layer catches these in practice; defense-in-depth covers the rest.
Keeping the guardian current
The second-dimension architecture is the foundation. The work going forward is keeping the guardrail current — and that work happens off the user’s screen, before any pattern reaches the classifier.
AI-agent attack surfaces evolve. New tool primitives — MCP servers, scheduled tasks, remote-control protocols — become standard parts of agent operation, and each introduces a new shape of risk. A guardrail that’s static against a moving threat model degrades into theater within twelve months. Vigilance is not a trait you ship once. It is a process.
That process lives at Polleo.ai, the company behind Gryphon. Polleo.ai brings years of experience protecting enterprise and consumer environments against motivated attackers, and applies that experience to the new attack surfaces AI introduces — agentic permission boundaries, indirect prompt injection, MCP and tool-use exposure, untrusted content reaching trusted contexts. The team actively maps these surfaces, studies adversary-emulation research and real-incident postmortems, and feeds what they learn back into Gryphon as a protection engine. Gryphon is the runtime expression of that work. The classifier the user sees is downstream of a research effort the user does not see.
The second-dimension design makes that pipeline sustainable. New patterns join the guardrail dimension without cluttering the convenience dimension, so users who set YOLO never see new prompts for ordinary work. The pattern catalogue grows. The convenience experience doesn’t. Every meaningful new attack shape — from published CVEs, adversary-emulation research, real incident postmortems — is candidate material for the guardrail.
The work expands in parallel. New attack surfaces as they become relevant — MCP server vetting, model-output content scanning, cross-session context provenance. And new providers. Gryphon ships with a clean provider abstraction; the same chat pane and the same guardrail work against any LLM behind it. Anthropic API and Claude Code are what’s there at v1. More LLM model providers come next, and power users will get local-LLM integration so the entire stack — model, classifier, vault — runs on their own machine.
The two windows described at the start are going to collapse, one AI integration at a time. Your calendar. Your email. Your research tooling. Your design files. Your wallet. Each will feel like a productivity win the day it launches, and each will introduce the same risk — an agent operating on data that matters, with a permission model that wasn’t designed for the access being granted.
Safeguarding those integrations is the work the industry hasn’t done yet. The second-dimension architecture is one answer; whatever else gets built has to answer the same question.
Gryphon was built for this phase. Bringing the AI inside Obsidian was the easy half. Keeping watch over what the AI can reach once it is in there — actively, continuously, against a moving threat model — is the half that matters. As the rest of your life becomes AI-reachable, the vigilant guardian matters more, not less.
Welcome the AI. Demand the vigilant guardian.
Gryphon is MIT-licensed and installable via BRAT while the Obsidian Community Plugins directory review runs its course. Built by Polleo.ai. Repo: https://github.com/polleoai/gryphon.