Armitage Archive

Highlight from The Security Paradox of Local LLMs

the software community lacks a safe, standard way to test AI assistant security. Unlike traditional software where penetration testing is routine, our only “safe” labs are the most vulnerable local models.

This new threat requires a new mindset. We must treat all AI-generated code with the same skepticism as any untrusted dependency and implement proper strategies in this new wave of LLM-assisted software development. Here are four critical defenses to start with:

  1. All generated code must be statically analysed for dangerous patterns (e.g., eval(), exec()) before execution, with certain language features potentially disabled by default.
  2. Initial execution of code should be in a sandbox (e.g., a container or WebAssembly runtime).
  3. The assistant’s inputs, outputs, and any resulting network traffic must be monitored for anomalous or malicious activity.
  4. A simple, stateless “second look” could prevent many failures. A secondary review by a much smaller, simpler model, tasked only with checking the final output for policy violations, could be a highly effective safety layer. For example, a small model could easily flag the presence of eval() in the generated code, even if the primary model was tricked into generating it.