Skip to content
Pentest Geniev0 · pre-release
Autonomous penetration testingv0 · pre-release

The intelligence of a hacker. The patience of a machine.

Pentest Genie is an autonomous penetration tester. It thinks like an attacker, chains real exploits, and verifies every finding with the same tools your red team uses.

30

tools wielded by a single agent

Pentest Genie dashboard — 192 findings surfaced, 60% hit rate, top vulnerabilities listed by class
Built on the best of offensive security
Claude OpusReasoning model
GPT-4oReasoning model
Gemini 2.5Reasoning model
OllamaLocal model runtime
BrowserbaseHeadless browser
PlaywrightBrowser automation
InteractshOut-of-band callbacks
nmap · nuclei · sqlmapRecon & probing
HackerOne formatReport export
Bugcrowd formatReport export
The status quo

Scanners match patterns. Attackers don't.

Manual pentests are slow and expensive — six figures and six weeks for a snapshot in time. Scanners are fast but shallow, blind to chained logic flaws and authenticated paths. Pentest Genie is what happens when the LLM is the hacker, not a wrapper around one.

  • No signatures. No playbooks. Decisions made from evidence.
  • Exploits chained autonomously — SQLi to credential dump to privilege escalation.
  • Every finding proven. Every proof reproducible.
  • Mechanism

    Four ideas that make it work.

    01Autonomy

    LLM as the operator

    The model picks what to test, in what order, based on what it just learned. Not a scanner with an LLM bolted on — the LLM is the loop.

    02Parallelism

    Multi-agent coordinator

    A planner decomposes the target into specialists — SQLi, auth bypass, IDOR — running in parallel with a shared budget and ledger.

    03Escalation

    Chain-mode exploitation

    A confirmed finding triggers the next move. The agent pivots from a leaked credential to a full privilege escalation, on its own.

    04Proof

    Verified, not claimed

    Browser-confirmed XSS. OOB-callback-confirmed blind SQLi. Code that ran, not patterns that matched.

    Inside the loop

    Watch the hack unfold.

    20 event types streamed live over WebSocket. Strictly monotonic sequence numbers per scan. Operators see every phase transition, every mission, every finding the agent decides to keep — in real time.

    agent.events
    00 / 20 events

    Simulated · live stream uses WebSocket in product.

    20 event types

    Outcomes

    What changes when the pentester is autonomous.

    01Speed

    Hours, not weeks

    A target that takes a manual pentest two weeks to scope is fully covered before the kickoff call.

    02Predictable

    Cost-capped

    Set a budget. The agent respects it and terminates gracefully when it hits the cap. No surprise bills.

    03Portable

    Bug-bounty ready

    Reports export to HackerOne and Bugcrowd formats out of the box — with reproducible PoC scripts attached.

    04Reproducible

    Real exploit chains

    Custom Python proofs, not just CVE references. The artifacts that come out are the artifacts you'd submit.

    ResearchPublic release · later this year

    The hardest pentest benchmark in the industry. In progress.

    Public benchmarks for autonomous pentesters don't exist yet. So we're building one — a suite of black-box targets across web, API, and infrastructure, each with a known-best human result.

    Early access

    Built for teams that take offense.

    Rolling access for bug bounty hunters, red teams, and AppSec leaders. Tell us what you want to test.