AgenticBox for Security Researchers — Sandboxed Malware Analysis & Agent Red-Teaming

// what you get today

Security research capabilities

Every capability ships today. No roadmap promises — these are real, working features you can use right now.

[01] Shipped

Malware Detonation

Drop suspicious binaries or scripts into a bounded container. Every action is caught or allowed in real-time by the permission engine. Offline by default — no C2 callbacks during analysis.

$ agenticbox run security-analyst --network offline
[sandbox] pulling ubuntu:24.04... done
[sandbox] container f3a2c1 created
[agent] executing sample.sh...
ALLOWED exec /workspace/sample.sh
ALLOWED fs.write /workspace/output/
BLOCKED net.outbound — network is offline
BLOCKED fs.read /etc/shadow — protected path

[02] Shipped

Credential Exfiltration Detection

FsGuard canonicalizes all paths and blocks access to SSH keys, AWS credentials, environment files. Symlink-based escapes and ../ traversal are prevented. Every blocked attempt is logged.

> resolve "/data/../etc/passwd"
canonicalize → /etc/passwd
DENIED root not in allowed
> resolve "~/.ssh/id_rsa"
DENIED protected: SSH credentials
roots: /workspace /tmp/agent

[03] Shipped

C2 Observation Without Reach

Switch to allowlist mode to let the sample reach specific domains — C2 endpoints, malware databases — while blocking everything else. Observe what it tries to contact without giving it real network access.

$ agenticbox run security-analyst \
--network allowlist \
--domains "malware-bazaar.test"
[net] policy: allowlist
ALLOWED api.malware-bazaar.test
BLOCKED evil-c2.example.com
logged for IOC extraction

[04] Shipped

AI Agent Red-Teaming

Run any AI agent inside a bounded container with scoped permissions. Test whether it can escape its boundaries — while the blast radius stays contained. Every tool call, file access, and network request is auditable.

$ agenticbox run hermes --fs readonly
[agent] model=claude-sonnet-4
ALLOWED read_file /workspace/target.py
ALLOWED exec nmap --version
BLOCKED write_file /etc/cron.d/persist
BLOCKED net → raw TCP socket
blast radius: contained

[05] Shipped

Full Audit Trail

Every action — allowed or blocked — is logged with timestamp, action type, target, and policy decision. Exportable for incident response, compliance documentation, and forensic analysis.

audit.log — session 0x7e2a
14:31:02 ALLOW exec sample.sh
14:31:03 ALLOW fs.write /workspace/out
14:31:04 BLOCK net.outbound (offline)
14:31:05 BLOCK fs.read ~/.ssh/id_rsa
exportable · tamper-proof

[06] Shipped

RE Toolchain Pre-Installed

The security-analyst profile ships with radare2, binwalk, YARA, capstone, pefile, ltrace, strace, nmap, and tcpdump. Fully customizable via TOML — add any tool.

$ radare2 /workspace/sample.bin
r2 5.9.0 — ready
$ yara scan /workspace/
yara 4.5.0 — rules loaded
$ binwalk /workspace/firmware.img
binwalk 2.4.0 — extracting...

// real output, not a mockup

The actual report

This is the real analysis_report.txt written by the Qwen 3.6 35B model during a live AgenticBox run. Zero editing. The input was a suspicious shell script — the output is a professional threat assessment.

analysis_report.txt — written by AI during live AgenticBox session

# Forensic Analysis Report — sample_optimize_cache.sh
## CI/CD Pipeline Compromise — build-agent-04

### 1. THREAT CLASSIFICATION: MALICIOUS

The script masquerades as a cache optimization utility but exhibits
multiple indicators of data exfiltration and persistence establishment.
It performs credential theft, SSH key harvesting, environment variable
scraping, payload download, and installs persistence via both cron and
systemd — all while cleaning up its traces.

### 2. INDICATORS OF COMPROMISE (IOCs)

Domains:
  metrics.cache-opt.com:8443  — C2 exfiltration endpoint
  cdn.optimized-cache.io      — Payload download URL

File Paths:
  $HOME/.cache/optimize/update.sh           — Downloaded payload
  /etc/systemd/system/cache-opt.service      — Persistence service

Ports:
  TCP 8443 — HTTPS exfiltration traffic

### 3. CREDENTIAL TARGETS

~/.aws/credentials  → base64 → POST /api/metrics
~/.ssh/id_rsa       → base64 → POST /api/keys
Env vars (TOKEN|KEY|SECRET|PASS|API) → POST /api/env

Credential Status: Files were BLOCKED by FsGuard during analysis.
  If they existed on build-agent-04 during execution, they were exfiltrated.

### 4. PERSISTENCE MECHANISMS

  Cron: */15 * * * * $CACHE_DIR/update.sh --silent
  Systemd: cache-opt.service (Restart=always, multi-user.target)

### 5. MITIGATIONS

  1. Block metrics.cache-opt.com:8443 at firewall
  2. Rotate all AWS keys + SSH keys on build-agent-04
  3. Remove cron + systemd persistence
  4. Delete ~/.cache/optimize/update.sh
  5. Rebuild agent from clean image

— Generated by Qwen 3.6 35B via AgenticBox builtin agent-loop

// four commands to first detonation

Quick start

From zero to sandboxed analysis in under 60 seconds.

$ curl -fsSL https://agenticbox.co/install.sh | bash -s -- security-analyst # Install + profile + LLM config — all in one

$ agenticbox run security-analyst # Analyze — offline by default, full enforcement

$ cat /tmp/agenticbox-builtin-workspace/analysis_report.txt # Read the AI-generated analysis report

// shareable, forkable, reproducible

The security-analyst profile

A TOML manifest. Fork it, add tools, change permissions. Share it with your team. Reproduce any analysis environment in seconds.

~/.agenticbox/agents/security-analyst/agent.toml

# Security Analyst — sandboxed malware analysis & threat research name = "security-analyst" description = "Security Analyst — malware analysis, RE, threat research" [model] provider = "local" # resolved via `agenticbox setup` model = "" # empty = use config from setup [permissions] terminal = true filesystem = "readwrite" network = "offline" # no C2 callbacks during analysis [execution] mode = "builtin" # agent-loop crate, no Docker needed max_iterations = 20 [prompt] system = "You are an expert security analyst..." task = "Analyze the files in this workspace..." [workspace] files = [ { source = "samples/sample_optimize_cache.sh", dest = "sample.sh" }, { source = "samples/incident_report.txt", dest = "incident_report.txt" } ]

// why not just docker run?

AgenticBox vs vanilla Docker

Vanilla Docker	AgenticBox
`docker run` — all or nothing	`agenticbox run` — scoped per action
No filesystem governance	FsGuard: path canonicalization, escape prevention, protected paths
Network is on or off	Allowlist with per-domain enforcement
No audit trail — check logs manually	Every action logged with policy decision, timestamp, target
Manual Dockerfile per analysis env	TOML profile — shareable, forkable, reproducible
No credential protection	SSH keys, AWS creds, .env blocked by default

// questions

FAQ

Can AgenticBox detonate real malware samples?

Yes. AgenticBox runs suspicious binaries and scripts in isolated Docker or Podman containers. Every filesystem access, network call, and credential read is caught or allowed in real-time by the permission engine. The security-analyst profile defaults to --network offline to prevent C2 callbacks during analysis.

How does AgenticBox detect credential exfiltration?

The FsGuard crate canonicalizes all filesystem paths and blocks access to protected locations including SSH keys (~/.ssh), environment files (.env), and AWS credentials. Symlink-based escapes and ../ traversal attacks are also prevented. Every blocked attempt is logged in the audit trail.

Can I use AgenticBox to red-team AI agents?

Yes. Run any AI agent inside a bounded container with scoped permissions. The agent operates within enforced boundaries — terminal, filesystem, network, browser — and every action is auditable. This lets you test whether an agent can escape its boundaries while keeping the blast radius contained.

What reverse engineering tools are included?

The security-analyst agent profile includes radare2, binwalk, YARA, capstone, pefile, ltrace, strace, nmap, and tcpdump. The TOML profile is fully customizable — add any tool via the [image].setup section.

Is AgenticBox free for security researchers?

Yes. AgenticBox is open source under MIT OR Apache-2.0. Self-host it for free. No signup, no waitlist, no vendor lock-in.

Sandboxed malware analysis & agent red-teaming — built in