Autonomous penetration testingv0 · pre-release

The intelligence of a hacker. The patience of a machine.

Pentest Genie is an autonomous penetration tester. It thinks like an attacker, chains real exploits, and verifies every finding with the same tools your red team uses.

Request early access See how it works

tools wielded by a single agent

Pentest Genie dashboard — 192 findings surfaced, 60% hit rate, top vulnerabilities listed by class

engine · live

▸ critical · cvss 9.8Database Password Exposed in Redis

Built on the best of offensive security

Claude OpusReasoning model

GPT-4oReasoning model

Gemini 2.5Reasoning model

OllamaLocal model runtime

BrowserbaseHeadless browser

PlaywrightBrowser automation

InteractshOut-of-band callbacks

nmap · nuclei · sqlmapRecon & probing

HackerOne formatReport export

Bugcrowd formatReport export

The status quo

Scanners match patterns. Attackers don't.

Manual pentests are slow and expensive — six figures and six weeks for a snapshot in time. Scanners are fast but shallow, blind to chained logic flaws and authenticated paths. Pentest Genie is what happens when the LLM is the hacker, not a wrapper around one.

No signatures. No playbooks. Decisions made from evidence.

Exploits chained autonomously — SQLi to credential dump to privilege escalation.

Every finding proven. Every proof reproducible.

Mechanism

Four ideas that make it work.

01Autonomy

LLM as the operator

The model picks what to test, in what order, based on what it just learned. Not a scanner with an LLM bolted on — the LLM is the loop.

02Parallelism

Multi-agent coordinator

A planner decomposes the target into specialists — SQLi, auth bypass, IDOR — running in parallel with a shared budget and ledger.

03Escalation

Chain-mode exploitation

A confirmed finding triggers the next move. The agent pivots from a leaked credential to a full privilege escalation, on its own.

04Proof

Verified, not claimed

Browser-confirmed XSS. OOB-callback-confirmed blind SQLi. Code that ran, not patterns that matched.

Inside the loop

Watch the hack unfold.

20 event types streamed live over WebSocket. Strictly monotonic sequence numbers per scan. Operators see every phase transition, every mission, every finding the agent decides to keep — in real time.

agent.events

00 / 20 events

Simulated · live stream uses WebSocket in product.

20 event types

Outcomes

What changes when the pentester is autonomous.

01Speed

Hours, not weeks

A target that takes a manual pentest two weeks to scope is fully covered before the kickoff call.

02Predictable

Cost-capped

Set a budget. The agent respects it and terminates gracefully when it hits the cap. No surprise bills.

03Portable

Bug-bounty ready

Reports export to HackerOne and Bugcrowd formats out of the box — with reproducible PoC scripts attached.

04Reproducible

Real exploit chains

Custom Python proofs, not just CVE references. The artifacts that come out are the artifacts you'd submit.

ResearchPublic release · later this year

The hardest pentest benchmark in the industry. In progress.

Public benchmarks for autonomous pentesters don't exist yet. So we're building one — a suite of black-box targets across web, API, and infrastructure, each with a known-best human result.

Early access

Built for teams that take offense.

Rolling access for bug bounty hunters, red teams, and AppSec leaders. Tell us what you want to test.

Request early access