Highlight from The Security Paradox of Local LLMs

the software community lacks a safe, standard way to test AI assistant security. Unlike traditional software where penetration testing is routine, our only “safe” labs are the most vulnerable local models.

This new threat requires a new mindset. We must treat all AI-generated code with the same skepticism as any untrusted dependency and implement proper strategies in this new wave of LLM-assisted software development. Here are four critical defenses to start with:

All generated code must be statically analysed for dangerous patterns (e.g., eval(), exec()) before execution, with certain language features potentially disabled by default.
Initial execution of code should be in a sandbox (e.g., a container or WebAssembly runtime).
The assistant’s inputs, outputs, and any resulting network traffic must be monitored for anomalous or malicious activity.
A simple, stateless “second look” could prevent many failures. A secondary review by a much smaller, simpler model, tasked only with checking the final output for policy violations, could be a highly effective safety layer. For example, a small model could easily flag the presence of eval() in the generated code, even if the primary model was tricked into generating it.