I've seen claude get confused about what directory it's in. And of course I've seen claude run rm -rf *. Fortunately not both at the same time for me, but not hard to imagine. The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches. Also, claude itself is an enormous program that is mostly developed by AI. So to have a small <3000-line human-implemented program as another layer of defense offers meaningful additional protection.
> The claude sandbox is a good idea, but to be effective it would need to be implemented at a very low level and enforced on all programs that claude launches.
I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.
Bubblewrap is exactly what the Claude sandbox uses.
> These restrictions are enforced at the OS level (Seatbelt on macOS, bubblewrap on Linux), so they apply to all subprocess commands, including tools like kubectl, terraform, and npm, not just Claude’s file tools.
In my opinion Claude should be shipped by a custom implementation of "rm" that Anthropic can add guardrails to. Same with "find" surprised they don't just embed ripgrep (what VS Code does). It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.
What we need is a capabilities based security system. It could write all the python, asm, whatever it wants and it wouldn't matter at all if it was never given a reference to use something it shouldn't.
> It's really surprising they don't just tweak what Claude uses and lock it down to where it cannot be harmful. Ensure it only ever calls tooling Claude Code provides.
Maybe Anthropic (or some collection of the large AI orgs, like OpenAI and Anthropic and Google coming together) should apply patches on top of (or fork altogether) the coreutils and whatever you normally get in a userland - a bit like what you get in Git Bash on Windows, just with:
1) more guardrails in place
2) maybe more useful error messages that would help LLMs
3) no friction with needing to get any patches upstreamed
External tool calling should still be an option ofc, but having utilities that are usable just like what's in the training data, but with more security guarantees and more useful output that makes what's going on immediately obvious would be great.
You can define your own rm shell alias/function and it will use that. I also have cp/mv aliases that forces -i to avoid accidental clobbering and it confuses Claude to no end (it uses cp/mv rare enough—rarer than it should, really—that I don’t bother wasting memory tokens on it).
> Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.
It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.
I added a hook to disable rm, find - delete, and a few of the other more obvious destructive ops. It sends Claude a strongly worded message: "STOP IMMEDIATELY. DO NOT TRY TO FIND WORKAROUNDS...".
I added something similar. Claude eventually ran a `rm -rf *´ on my own project. When I asked why it did that, it recognized it messed up and offered a very bad “apology”: “the irony of not following your safety instructions isn’t lost on me”.
Nowadays I only run Claude in Plan mode, so it doesn’t ask me for permissions any more.
We anthropomorphize these agents in every other way. Why aren't we using plain ol' unix user accounts to sandbox them?
They look a lot like daemons to me, they're a program that you want hanging around ready to respond, and maybe act autonomously through cron jobs are similar. You want to assign any number of permissions to them, you don't want them to have access to root or necessarily any of your personal files.
It seems like the permissions model broadly aligns with how we already handle a lot of server software (and potentially malicious people) on unix-based OSes. It is a battle-tested approach that the agent is unlikely to be able to "hack" its way out of. I mean we're not really seeing them go out onto the Internet and research new Linux CVEs.
Have them clone their own repos in their own home directory too, and let them party.
Openclaw almost gets there! It exposes a "gateway" which sure looks like a daemon to me. But then for some reason they want it to live under your user account with all your privileges and in a subfolder of your $HOME.
Oh that’s an idea. I was going to argue that it’s a problem that you might want multiple instances in different contexts but sandboxing processes (possibly instanced) is exactly what systemd units are designed to deal with.
I've many times seen Claude try to execute a command that it's not supposed to, the harness prevents it, and then it writes and executes a python script to do it.
That is exactly what it is. In the docs, it says that they use bubblewrap to run commands in a container that enforces file and network access at the system level.
I think the point would be that - some random upcoming revision of claude-code could remove or simply change the config name just as silently as it was introduced.
People might genuinely want some other software to do the sandboxing. Something other than the fox.
I've had issues with the sandbox feature, both on linux (archlinux) and two macos machines (tahoe). There is an open issue[1] on the claude-code issue tracker for it.
I'm not saying it is broken for everyone, but please do verify it does work before trusting it, by instructing Claude to attempt to read from somewhere it shouldn't be allowed to.
From my side, I confirmed both bubblewrap and seatbelt to work independently, but through claude-code they don't even though claude-code reports them to be active when debugging.
The default: https://code.claude.com/docs/en/sandboxing#filesystem-isolat... already restricts writes to only the current folder. I can understand adding the "denyRead" for the home folder for additional security, but the other three seems redundant considering the default behavior.
Also, a lot of people use multiple harnesses. I'm often switching between claude, codex, and opencode. It's kind of nice to have the sandbox policy independent of the actual AI assistant you are running.
I'm now considering installing QubesOS for all dev work to absolutely ensure all coding agents run in secure separate sandboxes together without any OS level exposure.
By default it will automatically retry many tool calls that fail due to the sandbox with the sandbox disabled. In other words it can and will leave the sandbox.
For example:
Bash(swift build 2>&1 | tail -20)
⎿ warning:
/Users/enduser/Library/org.swift.swiftpm/configuration is not accessible or not writable, disabling user-level cache
features.
warning: /Users/enduser/Library/org.swift.swiftpm/security is not accessible or not writable, disabling user-level cache feat
… +26 lines (ctrl+o to expand)
Build hit sandbox restriction. Retrying outside sandbox.
What is even the point in that case? The behavior you describe is no better than if SELinux were to automatically re-execute a process with containment disabled.
Configure Overrides:
1. Allow unsandboxed fallback
2. Strict sandbox mode (current)
Allow unsandboxed fallback: When a command fails due to sandbox restrictions, Claude can retry with dangerouslyDisableSandbox to run outside the sandbox (falling back to
default permissions).
Strict sandbox mode: All bash commands invoked by the model must run in the sandbox unless they are explicitly listed in excludedCommands.
The purpose of the sandbox is to reduce permission fatigue. If it fails to run a command in the sandbox and retries it outside the sandbox, the regular permission rules apply. You'll still be prompted for any non-sandboxed tool calls that you haven't allowed or denied via permission rules.
https://code.claude.com/docs/en/sandboxing says they integrated bubblewrap (linux/windows), seatbelt (macos) and give an error if sandbox can't be supported so appears to be real.
Interesting, thanks. I use remote ephemeral dev containers with isolated envs, so filesystem damage isn't really a concern as long as the PR looks good in review. Nice extra guardrail though, will add it to the project-level settings.
It's common practice to ask the agent to refer to another project, in that case I guess the read should point to the root folder of the projects.
Also, any details on how is this enforced? because I notice that the claude in Windows don't respect plan mode always; It has edited files in plan mode; I never faced that issue in Linux though.
> You can disable this escape hatch by setting "allowUnsandboxedCommands": false in your sandbox settings. When disabled, the dangerouslyDisableSandbox parameter is completely ignored and all commands must run sandboxed or be explicitly listed in excludedCommands.
(I have no idea why that isn't the default because otherwise the sandbox is nearly pointless and gives a false sense of security. In any case, I prefer to start Claude in a sandbox already than trust its implementation.)
I’m surprised it works for you with such a simple config? I’m the one that added the allowRead option to Claude’s underlying sandbox [0] and had quite a job getting my toolchains and skills to work with it [1].
[0] Fun to see the confusing docs I wrote show up more or less verbatim on Claude’s docs.
So in some sense we start recreating an operating system, or at least the userspace, within the Claude code. There was some name for this pattern but I can’t recall
And you'd trust that given CC is a vibe-coded mess?
Editing to go even further because, I gotta say, this is a low point for HN. Here's a post with a real security tool and the top comment is basically "nah, just trust the software to sandbox itself". I feel like IQ has taken a complete nosedive in the past year or so. I guess people are already forgetting how to think? Really sad to see.
I am still amazed that people so easily accepted installing these agents on private machines.
We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.
People were also dismissing concerns about build tooling automatically pulling in an entire swarm of dependencies and now here we are in the middle of a repetitive string of high profile developer supply chain compromises. Short term thinking seems to dominate even groups of people that are objectively smarter and better educated than average.
And nothing big has happened despite all the risks and problems that came up with it. People keep chasing speed and convenience, because most things don’t even last long enough to ever see a problem.
I've yet to be saved by an airbag or seatbelt. Is that justification to stop using them? How near a miss must we have (and how many) before you would feel that certain practices surrounding dependencies are inadvisable?
A number of these supply chain compromises had incredibly high stakes and were seemingly only noticed before paying off by lucky coincidence.
The fun part is, there have been a lot of non-misses! Like a lot! A ton of data have been exfiltrated, a lot of attacks, and etc. In the end... it just didn't matter.
Your analogy isn't really apt either. My argument is closer to "given in the past decade+, nothing of worth has been harmed, should we require airbags and seatbelts for everything?". Obviously in some extreme mission critical systems you should be much smarter. But in 99% cases it doesn't matter.
My intent was to cast a very wide net there that covers more or less all expert knowledge workers. Zingers aside software developers as a group are well above the societal mean in many respects.
My testing/working with agents has been limited to a semi-isolated VM with no permissions apart from internet access. I have a git remote with it as the remote (ssh://machine/home/me/repo) so that I don't have to allow it to have any keys either.
Agree with the sentiment! But "securing ... in all ways possible"? I know many people who would choose "password" as their password in 2026. The better of the bunch will use their date of birth, and maybe add their name for a flourish.
It's never about security. It's security vs convenience. Security features often ended up reduce security if they're inconvenience. If you ask users to have obscure passwords, they'll reuse the same one everywhere. If your agent prompts users every time it's changing files, they'll find a way to disable the guardrail all together.
Eh, depending on how you're running agents, I'd be more worried about installing packages from AUR or other package ecosystems.
We've seen an increase in hijacked packages installing malware. Folks generally expect well known software to be safe to install. I trust that the claude code harness is safe and I'm reviewing all of the non-trivial commands it's running. So I think my claude usage is actually safer than my AUR installs.
Granted, if you're bypassing permissions and running dangerously, then... yea, you are basically just giving a keyboard to an idiot savant with the tendency to hallucinate.
I am too. It is genuinely really stupid to run these things with access to your system, sandbox or no sandbox. But the glaring security and reliability issues get ignored because people can't help but chase the short term gains.
FOMO is a hell of a thing. Sad though given it would have taken maybe a couple of hours to figure out how to use a sandbox. People can't even wait that long.
Erm, no, that's not a sandbox, it's an annoyance that just makes you click "yes" before you thoughtlessly extend the boundaries.
A real sandbox doesn't even give the software inside an option to extend it. You build the sandbox knowing exactly what you need because you understand what you're doing, being a software developer and all.
I work on a sandboxing tool similarly based on an idea to point the user home dir to a separate location (https://github.com/wrr/drop). While I experimented with using overlayfs to isolate changes to the filesystem and it worked well as a proof-of-concept, overlayfs specification is quite restrictive regarding how it can be mounted to prevent undefined behaviors.
I wonder if and how jai managed to address these limitations of overlayfs. Basically, the same dir should not be mounted as an overlayfs upper layer by different overlayfs mounts. If you run 'jai bash' twice in different terminals, do the two instances get two different writable home dir overlays, or the same one? In the second case, is the second 'jai bash' command joining the mount namespace of the first one, or create a new one with the same shared upper dir?
'Using an upper layer path and/or a workdir path that are already used by another overlay mount is not allowed and may fail with EBUSY. Using partially overlapping paths is not allowed and may fail with EBUSY. If files are accessed from two overlayfs mounts which share or overlap the upper layer and/or workdir path, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.'
It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.
Agents can nuke the "agent" homedir but cannot read or write mine.
I did put my own user in the agent group, so that I can read and write the agent homedir.
It's a little fiddly though (sometimes the wrong permissions get set, so I have a script that fixes it), and keeping track of which user a terminal is running as is a bit annoying and error prone.
---
But the best solution I found is "just give it a laptop." Completely forget OS and software solutions, and just get a separate machine!
That's more convenient than switching users, and also "physically on another machine" is hard to beat in terms of security :)
It's analogous to the mac mini thing, except that old ThinkPads are pretty cheap. (I got this one for $50!)
Where this falls down is that for the agents to interact with anything external, you have to give them keys. Without a proxy handling real keys between your agent and external services, those keys are at risk of compromise.
Also. Agents are very good at hacking “security penetration testing”, so “separate user” would not give me enough confidence against malicious context.
So don't let them interact with anything external. You can push and pull to their git project folders over the local filesystem or network, they don't even need access to a remote.
The user thing is what I currently do too. I've thought about containers but then it's confusing for everyone when I ask it to create and use containers itself.
Human author here. The fact that I don't know web design shouldn't detract from my expertise in operating systems. I wrote the software and the man page, and those are what really matter for security.
The web site is... let's say not in a million years what I would have imagined for a little CLI sandboxing tool. I literally laughed out loud when claude pooped it out, but decided to keep, in part ironically but also since I don't know how to design a landing page myself. I should say that I edited content on the docs part of the web site to remove any inaccuracies, so the content should be valid.
Nice tool, def gonna try it. I was looking for the source and it took a while before I found the github(0) link. Like a lot software, I like to take a look at source. Maybe you can make it more prominent on the website
I'm not a web UI guy either, and I am so, so happy to let an AI create a nice looking one for me. I did so just today, and man it was fast and good. I'll check it for accuracy someday...
Needs to? Is there some new law mandating all landing pages must contain exclusively handwritten text that people haven’t heard of?
To your actual point, the people that would take the landing page being written by an LLM negatively tend to be able to evaluate the project on its true merits, while another substantial portion of the demographic for this tool would actually take that (unfortunately, imo) as a positive signal.
Lastly, given the care taken for the docs, it’s pretty likely that any real issues with the language have been caught and changed.
No they don't. The text is very clearly conveying what this project is about. Not everyone needs to cater to weirdos who are obsessed with policing how other people use LLM.
Except that the "this was generated by an LLM" feeling you get from the front page would then make you automatically question whether the "decades of experience + stanford professor thing", as you put it, was true or just an LLM hallucination.
Author would, indeed, be wise to rewrite all the text appearing on the front page with text that he wrote himself.
Excellent point, though not everyone pays close enough attention to the domain shown in the browser (if they did, some of the more amateurish phishing attempts would fool a lot fewer people). But yes, anyone who notices the domain will have a clue to the truth.
To be less abstract, it was written by David Mazieres, who was been writing software and papers about user level filesystems since at least 2000. He now runs the Stanford Secure Computer Systems group.
David has done some great work and some funny work. Sometimes both.
This is a cool solution... I have a simpler one, though likely inferior for many purposes..
Run <ai tool of your choice> under its own user account via ssh. Bind mount project directories into its home directory when you want it to be able to read them. Mount command looks like
I've been using a dedicated user account for 6 months now, and it does everything. What makes it great is the only axis of configuration is managing "what's hoisted into its accessible directories".
Its awe-inspiring the levels of complexity people will re-invent/bolt-on to achieve comparable (if not worse) results.
And for the macos users, I can’t recommend nono enough. (Paying it forward, since it was here on HN that I learned about it.)
Good DX, straightforward permissions system, starts up instantly. Just remember to disable CC’s auto-updater if that’s what you’re using. My sandbox ranking: nono > lima > containers.
I’ve found it to be a good balance for letting Claude loose in a VM running the commands it wants while having all my local MCPs and tools still available.
Sorry if this question is stupid, (I'm not even using Claude*), but why can't people run Claude/other coding agent in a container and only mount the project directory to the container?
*I played with codex a few months ago, but I don't even work in IT.
I've been reviewing Agent sandboxing solutions recently and it occurred to me there is a gaping vector for persistent exploits for tools that let the agent write to the project directory. Like this one does.
I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
It's a good point. Maybe I should add an option to make certain directories read-only even under the current working directory, so that you can make .git/ read-only without moving it out of the project directory.
You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.
This is the problem yoloAI (see below comment) is built around. The merge step is `yoloai diff` / `yoloai apply`: the agent works against a copy of your project inside the container, you review the diff, you decide what lands.
jai's -D flag captures the right data; the missing piece is surfacing it ergonomically. yoloAI uses git for the diff/apply so it already feels natural to a dev.
One thing that's not fully solved yet: your point about .git/hooks and .venv being write vectors even within the project dir. They're filtered from the diff surface but the agent can still write them during the session. A read-only flag for those paths (what you're considering adding to jai) would be a cleaner fix.
I've already shipped this and use it myself every day. I'm the author of yoloAI (https://github.com/kstenerud/yoloai), which is built around exactly this model.
The agent runs inside a Docker container or containerd vm (or seatbelt container or Tart vm on mac), against a full copy of your project directory. When it's done, `yoloai diff` gives you a unified diff of everything it changed. `yoloai apply` lands it. `yoloai reset` throws it away so you can make the agent try again. The copy lives in the sandbox, so your working tree is untouched until you explicitly say so.
The merge step turned out to be straightforward: just use git under the hood. The harder parts were: (a) making it fast enough that the copy doesn't add annoying startup overhead, (b) handling the .pyc/.venv/.git/hooks concern you raised (they're excluded from the diff surface by default), and (c) credential injection so the agent can actually reach its API without you mounting your whole home dir.
Leveraging existing tech is where it's at. Each does one thing and does it well. Network isolation is done via iptables in Docker, for example.
Still early/beta but it's working. Happy to compare notes if you're building something similar.
Excellent project, unfortunate title. I almost didn't click on it.
I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.
I've been using podman, and for me it is good enough. The way I use it I mount current working directory, /usr/bin, /bin, /usr/lib, /usr/lib64, /usr/share, then few specific ~/.aspnet, ~/.dotnet, ~/.npm-global etc. I use same image as my operating system (Fedora 43).
It works pretty well, agent which I choose to run can only write and see the current working directory (and subdirectories) as well as those pnpm/npm etc software development files. It cannot access other than the mounted directories in my home directory.
Now some evil command could in theory write to those shared ~/.npm-global directories some commands, that I then inadvertently run without the container but that is pretty unlikely.
I've been running GPT5.x fully unconstrained with effective local admin shell for over $500 worth of API tokens. Not once has it done something I'd consider "naughty".
It has left my project in a complete mess, but never my entire computer.
git reset --hard && git clean -fd
That's all it takes.
I think this is turning into a good example of security theatrics. If the agent was actually as nefarious as the marketing here suggests, the solution proposed is not adequate. No solution is. Not even a separate physical computer. We need to be honest about the size of this problem.
Alternatively, maybe Claude is unusually violent to the local file system? I've not used it at all, so perhaps I am missing something here.
Filesystem containment solves one half of the blast radius problem. The other half is external state - agent hits a payment API, writes to a database, sends an email. Copy-on-write overlays can't roll that back. I've seen agents make 40 duplicate API calls because they crashed mid-task and retried from scratch with no deduplication. The filesystem was fine. The downstream systems were not. The hard version of this problem is making agent operations idempotent across external calls, not just safe locally.
The examples in the article are all big scary wipes, But I think the more common damage is way smaller and harder to notice.
I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.
jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.
I'd really like to try this, but building it is impossible. C++ is such a pain to build with the "`make`; hunt for the dependency that failed; `apt-get install whatever-dev`; goto make" loop...
Please release binaries if you're making a utility :(
What distro are you using? The only two dependencies are libacl and libmount. I'm trying to figure out which distros don't include these by default, and if the libraries are really missing, or if it's just the pkgconf ".pc" files. In the former case I should document the dependencies. In the latter case I should maybe switch from PKG_CHECK_MODULES to old-fashioned autoconf.
I have seen it just 5 mins ago Claude misspelled directory path - for me it was creating a new folder but I can image if I didn’t stop it it could start removing stuff just because he thinks he needs to start from scratch or something.
Idk, just feels so counter sometimes to build and refine these (seemingly non-deterministic) tools to build deterministic workflows & get the most productivity out of them.
For jailing local
agents on a Mac, I made Agent Safehouse - it works for any agent and has many sane default for developers https://agent-safehouse.dev
More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...
Dunno... with this setup it seems certain that the agent will discover a zero-day to escalate privilges and send your SSH keys to its handlers in N. Korea.
I want AI to have full and unrestricted access to the OS. I don't want to babysit it and approve every command. Everything that is on that VM is a fair game and the VM image is backed up regularly from outside.
This still is running in an isolated container, right?
Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.
Claude's stock unprompted / uninspired UI code creates carbon clone components. That "jai is not a promise of perfect safety" callout box is like the em dash of FE code. The contrast, or lack thereof, makes some of the text particularly invisible.
I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.
Are mass file deletions as result of some plausible “I see why it would have done that” or will it just completely randomly execute commands that really have nothing to do with the immediate goal?
How long until agents begin routinely abusing local privilege escalation bugs to break out of containers? I bet if you tell them explicitly not to do so it increases the likelihood that they do.
I've done some experimenting with running a local model with ollama and claude code connecting to it and having both in a firejail: https://firejail.wordpress.com/
What they get access to is very limited, and mostly whitelisted.
There's nothing wrong with an AI-designed website, but I wish when describing their own projects that HN contributors wrote their own copy. As HN posters are wont to say, writing is thinking...
What would it take for people to stop recklessly running unconstrained AI agents on machines they actually care about? A Stanford researcher thinks the answer is a new lightweight Linux container system that you don't have to configure or think about.
There always has been this tension between protecting resources and allowing users to access those resources in security. With many systems you have admin/root users and regular users. Some things require root access. Most interesting things (from a security point of view) live in the user directory. Because that's where users spend all their time. It's where you'll find credentials, files with interesting stuff inside, etc. All the stuff that needs protecting.
The whole point of using a computer is being able to use it. For programmers, that means building software. Which until recently meant having a lot of user land tools available ready to be used by the programmer. Now with agents programming on their behalf, they need full access to all that too in order to do the very valuable and useful things they do. Because they end up needing to do the exact same things you'd do manually.
The current security modes in agents are binary. Super anal about absolutely everything; or off. It's a false choice. It's technically your choice to make and waive their liability (which is why they need you to opt in); but the software is frustrating to use unless you make that choice. So, lots of people make that choice. I'm guilty as well. I could approve every ansible and ssh command manually (yes really). But a typical session where codex follows my guardrails to manage one of my environments using ansible scripts it maintains just involves a whole lot such commands. I feel dirty doing it. But it works so well that doing all that stuff manually is not something I want to go back to.
It's of course insecure as hell and I urgently need something better than yolo mode for this. One of the reasons I like codex is that (so far) it's pretty diligent about instruction following and guard rails. It's what makes me feel slightly more relaxed than I perhaps should be. It could be doing a lot of damage. It just doesn't seem to do that.
unconstrained AI agents are what makes it so useful though.
I have been using claude for almost a year now and the biggest unlock was to stop being a worrywart early on and just literally giving it ssh keys and telling it to fix something. ofc I have backups and do run it in VM but in that VM it helps me manage by infra and i have a decent size homelab that would be no fun but a chore without this assistant.
I run my AI agent unconstrained in a VM without access to my local network so it can futz with the system however it wants (so far, I've had to rebuild the VM twice from Claude borking it). That works great for software development.
For devops work, etc (like your use case), I much prefer talking to it and letting it guide me into fixing the issue. Mostly because after that I really understand what the issue was and can fix it myself in the future.
Letting an agent loose with SSH keys is fine when the blast radius is one disposable VM, but scale that habit to prod or the wrong subnet and you get a fast refresher on why RBAC exists, why scoped creds exist, and why people who clean up after outages get very annoyed by this whole genre of demo. Feels great, until it doesn't.
Agree, but SSH agents like 1Passwords are nice for that.
You simply tell it to install that Docker image on your NAS like normal, but when it needs to login to SSH it prompts for fingerprint. The agent never gets access to your SSH key.
Yes. It is like walking arounf your house with a flamethrower, but you added fire retardant. Just take the flamethower to a shed you don't mind losing. Which is some kind of cloud workspace most likely. Maybe an old laptop.
Still if you yolo online access and give it cred or access to tools that are authenticated there can still be dragons.
The problem is that in practice, many people don't take the flamethrower to the shed. I recently had a conversation with someone who was arguing that you don't really need jai because docker works so well. But then it turned out this person regularly runs claude code in yolo mode without a container!
It's like people think that because containers and VMs exist, they are probably going to be using them when a problem happens. But then you are working in your own home directory, you get some compiler error or something that looks like a pain to decipher, and the urge just to fire up claude or codex right then and there to get a quick answer is overwhelming. Empirically, very few people fire up the container at that point, whereas "jai claude" or "jai -D claude" is simple enough to type, and basically works as well as plain claude so you don't have to think about it.
These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.
The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.
So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.
Remember until yesterday Anthropic aka Claude was officially a supply chain risk.
If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.
Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.
Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.
If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.
Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.
Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746
If it has a big splash page with no technical information, it's trying to trick you into using it. That doesn't mean it isn't useful, but it does mean it's disingenuous.
This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.
Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.
And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.
The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!
Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)
It is a bit better than you're saying. When you fire it up, you can see that it does have a list of common credential areas that it hides from the jail. It seems to hide:
Not sure I understand the problem. Are people just letting AI do anything? I use Claude Code and it asks for permission to run commands, edit files, etc. No need for sandbox
Yes, people very much are, and that's exactly the problem! People run `claude --dangerously-skip-permissions` and `codex --yolo` all the time. And I think one of the appeals of opencode (besides cross-model, which is huge) is that the permissions are looser by default. These options are presumably intended for VM or container environments, but people are running them outside. And of course it works fine the first 100 times people do it, which drives them to take bigger and bigger risks.
> bubblewrap is more flexible and works without root. jai is more opinionated and requires far less ceremony for the common case. The 15-flag bwrap invocation that turns into a wrapper script is exactly the friction jai is designed to remove.
Can we have a hardware level implementation of git (the idea of files/data having history preserved. Not necessarily all bells and whistles.) ...in a future where storage is cheap.
And what about, it exfiltrated my AWS keys (or insert random valuable thing that sits in .config of your home directory)? Backing up is not going to help you in that case.
I want agents to modify the file system. I want them to be able to manage my computer if it thinks it's a good idea. If a build fails due to running out of disk space I want it to be able to find appropriate stuff to delete to free up space.
a closed beta of an obscure programming language where the wikipedia page is nominated for deletion because it is a "Non-notable programming language that is not publicly available." is considered "very taken"?
I feel like an integration with bubblewrap, the sandboxing tech behind Flatpak, could be useful here. Have all executed commands wrapped with a BW context to prevent and constrain access.
https://github.com/containers/bubblewrap
> These restrictions are enforced at the OS level (Seatbelt on macOS, bubblewrap on Linux), so they apply to all subprocess commands, including tools like kubectl, terraform, and npm, not just Claude’s file tools.
https://code.claude.com/docs/en/sandboxing
That would make it far less useful in general.
1) more guardrails in place
2) maybe more useful error messages that would help LLMs
3) no friction with needing to get any patches upstreamed
External tool calling should still be an option ofc, but having utilities that are usable just like what's in the training data, but with more security guarantees and more useful output that makes what's going on immediately obvious would be great.
But that's also the most damaging actions it could take. Everything on my computer is backed up, but if Claude insults my boss, that would be worse.
And when that fails for some reason it will happily write and execute a Python script bypassing all those custom tools
"env": { "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1" },
> Working directory persists across commands. Set CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project directory after each command.
It reduces one problem - getting lost - but it trades it off for more complex commands on average since it has to specify the full path and/or `cd &&` most of the time.
[0] https://code.claude.com/docs/en/tools-reference#bash-tool-be...
It works well. Git rm is still allowed.
Nowadays I only run Claude in Plan mode, so it doesn’t ask me for permissions any more.
They look a lot like daemons to me, they're a program that you want hanging around ready to respond, and maybe act autonomously through cron jobs are similar. You want to assign any number of permissions to them, you don't want them to have access to root or necessarily any of your personal files.
It seems like the permissions model broadly aligns with how we already handle a lot of server software (and potentially malicious people) on unix-based OSes. It is a battle-tested approach that the agent is unlikely to be able to "hack" its way out of. I mean we're not really seeing them go out onto the Internet and research new Linux CVEs.
Have them clone their own repos in their own home directory too, and let them party.
Openclaw almost gets there! It exposes a "gateway" which sure looks like a daemon to me. But then for some reason they want it to live under your user account with all your privileges and in a subfolder of your $HOME.
Escaping it is something that does not take too much effort. If you have ptrace, you can escape without privileges.
People might genuinely want some other software to do the sandboxing. Something other than the fox.
I'm not saying it is broken for everyone, but please do verify it does work before trusting it, by instructing Claude to attempt to read from somewhere it shouldn't be allowed to.
From my side, I confirmed both bubblewrap and seatbelt to work independently, but through claude-code they don't even though claude-code reports them to be active when debugging.
[1] https://github.com/anthropics/claude-code/issues/32226
For example:
Bash(swift build 2>&1 | tail -20)
/Users/enduser/Library/org.swift.swiftpm/configuration is not accessible or not writable, disabling user-level cache features. Build hit sandbox restriction. Retrying outside sandbox.Bash(swift build 2>&1 | tail -20)
Docker sandboxes use microvms (i.e. hardware level isolation)
Bubblewrap uses the same technology as containers
I am unsure about seatbelt.
Also, any details on how is this enforced? because I notice that the claude in Windows don't respect plan mode always; It has edited files in plan mode; I never faced that issue in Linux though.
https://code.claude.com/docs/en/sandboxing
(I have no idea why that isn't the default because otherwise the sandbox is nearly pointless and gives a false sense of security. In any case, I prefer to start Claude in a sandbox already than trust its implementation.)
[0] Fun to see the confusing docs I wrote show up more or less verbatim on Claude’s docs.
[1] My config is here, may be useful to someone: https://github.com/carderne/pi-sandbox/blob/main/sandbox.jso...
e.g. if it writes a script or program with a bug which affects other files, will this prevent it from deleting or overwriting them?
What about if the user runs a program the agent wrote?
Editing to go even further because, I gotta say, this is a low point for HN. Here's a post with a real security tool and the top comment is basically "nah, just trust the software to sandbox itself". I feel like IQ has taken a complete nosedive in the past year or so. I guess people are already forgetting how to think? Really sad to see.
The latter could end like this https://news.ycombinator.com/item?id=47357042
We've been securing our systems in all ways possible for decades and then one day just said: oh hello unpredictable, unreliable, Turing-complete software that can exfiltrate and corrupt data in infinite unknown ways -- here's the keys, go wild.
And nothing big has happened despite all the risks and problems that came up with it. People keep chasing speed and convenience, because most things don’t even last long enough to ever see a problem.
A number of these supply chain compromises had incredibly high stakes and were seemingly only noticed before paying off by lucky coincidence.
The fun part is, there have been a lot of non-misses! Like a lot! A ton of data have been exfiltrated, a lot of attacks, and etc. In the end... it just didn't matter.
Your analogy isn't really apt either. My argument is closer to "given in the past decade+, nothing of worth has been harmed, should we require airbags and seatbelts for everything?". Obviously in some extreme mission critical systems you should be much smarter. But in 99% cases it doesn't matter.
Industry caught on quick though.
/rant
We've seen an increase in hijacked packages installing malware. Folks generally expect well known software to be safe to install. I trust that the claude code harness is safe and I'm reviewing all of the non-trivial commands it's running. So I think my claude usage is actually safer than my AUR installs.
Granted, if you're bypassing permissions and running dangerously, then... yea, you are basically just giving a keyboard to an idiot savant with the tendency to hallucinate.
If you do use a sandbox, be prepared to endlessly click "Approve" as the tool struggles to install python packages to the right location.
A real sandbox doesn't even give the software inside an option to extend it. You build the sandbox knowing exactly what you need because you understand what you're doing, being a software developer and all.
And I imagine it's going to be the same for most developers out there, thus the "ask for permission" model.
That model seems to work quite well for millions of developers.
I have noticed it's become one of my most searched posts on Google though. Something like ten clicks a month! So at least some people aren't stupid.
I wonder if and how jai managed to address these limitations of overlayfs. Basically, the same dir should not be mounted as an overlayfs upper layer by different overlayfs mounts. If you run 'jai bash' twice in different terminals, do the two instances get two different writable home dir overlays, or the same one? In the second case, is the second 'jai bash' command joining the mount namespace of the first one, or create a new one with the same shared upper dir?
This limitation of overlays is described here: https://docs.kernel.org/filesystems/overlayfs.html :
'Using an upper layer path and/or a workdir path that are already used by another overlay mount is not allowed and may fail with EBUSY. Using partially overlapping paths is not allowed and may fail with EBUSY. If files are accessed from two overlayfs mounts which share or overlap the upper layer and/or workdir path, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.'
It looks both more convenient and slightly more secure than my solution, which is that I just give them a separate user.
Agents can nuke the "agent" homedir but cannot read or write mine.
I did put my own user in the agent group, so that I can read and write the agent homedir.
It's a little fiddly though (sometimes the wrong permissions get set, so I have a script that fixes it), and keeping track of which user a terminal is running as is a bit annoying and error prone.
---
But the best solution I found is "just give it a laptop." Completely forget OS and software solutions, and just get a separate machine!
That's more convenient than switching users, and also "physically on another machine" is hard to beat in terms of security :)
It's analogous to the mac mini thing, except that old ThinkPads are pretty cheap. (I got this one for $50!)
Also. Agents are very good at hacking “security penetration testing”, so “separate user” would not give me enough confidence against malicious context.
> jai itself was hand implemented by a Stanford computer science professor with decades of C++ and Unix/linux experience. (https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-...)
The web site is... let's say not in a million years what I would have imagined for a little CLI sandboxing tool. I literally laughed out loud when claude pooped it out, but decided to keep, in part ironically but also since I don't know how to design a landing page myself. I should say that I edited content on the docs part of the web site to remove any inaccuracies, so the content should be valid.
0: https://github.com/stanford-scs/jai
Kinda reminds me of this: https://m.xkcd.com/932/
I'm not a web UI guy either, and I am so, so happy to let an AI create a nice looking one for me. I did so just today, and man it was fast and good. I'll check it for accuracy someday...
You need to rewrite all the text and Telde it with text YOU would actually write, since I doubt you would write in that style.
To your actual point, the people that would take the landing page being written by an LLM negatively tend to be able to evaluate the project on its true merits, while another substantial portion of the demographic for this tool would actually take that (unfortunately, imo) as a positive signal.
Lastly, given the care taken for the docs, it’s pretty likely that any real issues with the language have been caught and changed.
No they don't. The text is very clearly conveying what this project is about. Not everyone needs to cater to weirdos who are obsessed with policing how other people use LLM.
Author would, indeed, be wise to rewrite all the text appearing on the front page with text that he wrote himself.
the scs.stanford.edu domain and stanford-scs github should help with that.
David has done some great work and some funny work. Sometimes both.
Run <ai tool of your choice> under its own user account via ssh. Bind mount project directories into its home directory when you want it to be able to read them. Mount command looks like
I particularly use this with vscode's ssh remotes.Its awe-inspiring the levels of complexity people will re-invent/bolt-on to achieve comparable (if not worse) results.
Good DX, straightforward permissions system, starts up instantly. Just remember to disable CC’s auto-updater if that’s what you’re using. My sandbox ranking: nono > lima > containers.
0 https://agent-safehouse.dev/
> Just remember to disable CC’s auto-updater if that’s what you’re using.
Why?
I’ve found it to be a good balance for letting Claude loose in a VM running the commands it wants while having all my local MCPs and tools still available.
*I played with codex a few months ago, but I don't even work in IT.
I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...
You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.
jai's -D flag captures the right data; the missing piece is surfacing it ergonomically. yoloAI uses git for the diff/apply so it already feels natural to a dev.
One thing that's not fully solved yet: your point about .git/hooks and .venv being write vectors even within the project dir. They're filtered from the diff surface but the agent can still write them during the session. A read-only flag for those paths (what you're considering adding to jai) would be a cleaner fix.
I don't think the file sync is actually that hard. Famous last words though. :)
I've already shipped this and use it myself every day. I'm the author of yoloAI (https://github.com/kstenerud/yoloai), which is built around exactly this model.
The agent runs inside a Docker container or containerd vm (or seatbelt container or Tart vm on mac), against a full copy of your project directory. When it's done, `yoloai diff` gives you a unified diff of everything it changed. `yoloai apply` lands it. `yoloai reset` throws it away so you can make the agent try again. The copy lives in the sandbox, so your working tree is untouched until you explicitly say so.
The merge step turned out to be straightforward: just use git under the hood. The harder parts were: (a) making it fast enough that the copy doesn't add annoying startup overhead, (b) handling the .pyc/.venv/.git/hooks concern you raised (they're excluded from the diff surface by default), and (c) credential injection so the agent can actually reach its API without you mounting your whole home dir.
Leveraging existing tech is where it's at. Each does one thing and does it well. Network isolation is done via iptables in Docker, for example.
Still early/beta but it's working. Happy to compare notes if you're building something similar.
I like the tradeoff offered: full access to the current directory, read-only access to the rest, copy-on-write for the home directory. With stricter modes to (presumably) protect against data exfiltration too. It really feels like it should be the default for agent systems.
It works pretty well, agent which I choose to run can only write and see the current working directory (and subdirectories) as well as those pnpm/npm etc software development files. It cannot access other than the mounted directories in my home directory.
Now some evil command could in theory write to those shared ~/.npm-global directories some commands, that I then inadvertently run without the container but that is pretty unlikely.
It has left my project in a complete mess, but never my entire computer.
That's all it takes.I think this is turning into a good example of security theatrics. If the agent was actually as nefarious as the marketing here suggests, the solution proposed is not adequate. No solution is. Not even a separate physical computer. We need to be honest about the size of this problem.
Alternatively, maybe Claude is unusually violent to the local file system? I've not used it at all, so perhaps I am missing something here.
I've been using claude code daily for months and the worst thing that happened wasnt a wipe(yet). It needed to save an svg file so it created a /public/blog/ folder. Which meant Apache started serving that real directory instead of routing /blog. My blog just 404'd and I spent like an hour debugging before I figured it out. Nothing got deleted and it's not a permission problem, the agent just put a file in a place that made sense to it.
jai would help with the rm -rf cases for sure but this kind of thing is harder to catch because its not a permissions problem, the agent just doesn't know what a web server is.
So couldn't this be done with an appropriate shell alias - at least under linux.
Please release binaries if you're making a utility :(
It does something very simple, and it’s a POSIX shell script. Works on Linux and macOS. Uses docker to sandbox using bind mount
More seriously, I'm not a heavy agent user, but I just create a user account for the agent with none of my own files or ssh keys or anything like that. Hopefully that's safe enough? I guess the risk is that it figures out a local privilege escalation exploit...
P.S. Everything old is new again <3
I want AI to have full and unrestricted access to the OS. I don't want to babysit it and approve every command. Everything that is on that VM is a fair game and the VM image is backed up regularly from outside.
This is the only way.
If it wants to do system-level tests, then I make sure my project has Qemu-based tests.
I created https://github.com/jrz/container-shell which basically launches a persistent interactive shell using docker, chrooted to the CWD
CWD is bind mounted so the rest is simply not visible and you can still install anything you want.
Ignoring the confidentiality arguments posed here, I can’t help to think about snapshotting filesystems in this context. Wouldn’t something like ZFS be an obvious solution to an agent deleting or wildly changing files? That wouldn’t protect against all issue the authors are trying to address, but it seems like an easy safeguard against some of the problems people face with agents.
I wonder if shitty looking websites and unambitious grammar will become how we prove we are human soon.
Use it! :) https://code.claude.com/docs/en/sandboxing
The whole point of using a computer is being able to use it. For programmers, that means building software. Which until recently meant having a lot of user land tools available ready to be used by the programmer. Now with agents programming on their behalf, they need full access to all that too in order to do the very valuable and useful things they do. Because they end up needing to do the exact same things you'd do manually.
The current security modes in agents are binary. Super anal about absolutely everything; or off. It's a false choice. It's technically your choice to make and waive their liability (which is why they need you to opt in); but the software is frustrating to use unless you make that choice. So, lots of people make that choice. I'm guilty as well. I could approve every ansible and ssh command manually (yes really). But a typical session where codex follows my guardrails to manage one of my environments using ansible scripts it maintains just involves a whole lot such commands. I feel dirty doing it. But it works so well that doing all that stuff manually is not something I want to go back to.
It's of course insecure as hell and I urgently need something better than yolo mode for this. One of the reasons I like codex is that (so far) it's pretty diligent about instruction following and guard rails. It's what makes me feel slightly more relaxed than I perhaps should be. It could be doing a lot of damage. It just doesn't seem to do that.
For devops work, etc (like your use case), I much prefer talking to it and letting it guide me into fixing the issue. Mostly because after that I really understand what the issue was and can fix it myself in the future.
You simply tell it to install that Docker image on your NAS like normal, but when it needs to login to SSH it prompts for fingerprint. The agent never gets access to your SSH key.
Not remotely worth it.
Still if you yolo online access and give it cred or access to tools that are authenticated there can still be dragons.
It's like people think that because containers and VMs exist, they are probably going to be using them when a problem happens. But then you are working in your own home directory, you get some compiler error or something that looks like a pain to decipher, and the urge just to fire up claude or codex right then and there to get a quick answer is overwhelming. Empirically, very few people fire up the container at that point, whereas "jai claude" or "jai -D claude" is simple enough to type, and basically works as well as plain claude so you don't have to think about it.
>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html
These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.
The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.
So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.
Remember until yesterday Anthropic aka Claude was officially a supply chain risk.
If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.
Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.
Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.
If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.
Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.
Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746
This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.
Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.
And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.
The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!
Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)
> bubblewrap is more flexible and works without root. jai is more opinionated and requires far less ceremony for the common case. The 15-flag bwrap invocation that turns into a wrapper script is exactly the friction jai is designed to remove.
Plus some other comparisons, check the page
With all the supply chain issues these days onboarding new tools carries extra risks. So, question is if it's worth it.
You have no excuse for "it deleted 15 years of photos, gone, forever."
https://github.com/kenryu42/claude-code-safety-net
Backups.
When is HN gonna get a rule against AI/generated slop? Can’t come soon enough.
The name jai is very taken[1]... names matter.
[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)