Is OpenHack a replacement for pentesters?

No. It can structure and speed up reviews, but scope, risk and evidence still require human judgment.

Why do files matter here?

Files make the run traceable. A team can later see which hints, scenarios and triage decisions led to a finding.

Is a Semgrep hit automatically a vulnerability?

No. According to the project description, Semgrep hits are treated as hints. A vulnerability has to be supported through scenario work and triage.

OpenHack: AI bug hunting with files, checkpoints and triage

Q: What tools does OpenHack need?

It runs inside a coding harness such as Claude Code, Codex, Cursor or a custom runner. The harness provides model access, a terminal and repository access.

What this is about

Hadrian made OpenHack public on May 25, 2026: an MIT-licensed open-source workspace system for source-guided vulnerability research. It is not another chatbot that reads code and then announces a verdict. Instead, OpenHack records every step of a security review as files: recon results, routing units, scenarios, scenario results, finding candidates, triage decisions, final findings and logs.

That matters because AI in security often fails on trust. When a model claims it has found a critical flaw, teams need to know: where did that claim come from? Was it checked? Who approved the next step? OpenHack tries to make that chain visible.

What OpenHack actually does

OpenHack runs inside a coding harness such as Claude Code, Codex, Cursor or a custom runner. The harness provides model access, a terminal, repository access and the execution environment. OpenHack provides the durable workflow: it creates a run structure, collects review surfaces, generates scenarios, has expert agents test those scenarios and then routes the results through separate triage.

The core is a narrow state chain: recon item → routing unit → scenario → scenario result → finding candidate → triage decision → finding. A human approves the phase transitions. That turns a loose agent loop into an auditable process. Optional Semgrep rules can enrich the recon phase; according to the project description, those hits are treated as hints, not automatically as proven vulnerabilities.

Why it matters

Security teams are caught between two extremes. On one side, coding agents can inspect large codebases faster than individual analysts. On the other side, they create false positives, miss context or hallucinate exploit paths. OpenHack targets the place where real security teams live: not the flashy demo, but the reproducible proof.

The published workflow names twelve expert families aligned with categories from the OWASP Top 10, MITRE terminology and CWE classes. That does not prove OpenHack will automatically produce better findings. But it is an important signal: agent work is mapped into known security categories instead of being sold as a proprietary black box. For open-source projects and smaller teams, the MIT license also matters because it allows experimentation without a large budget.

In plain language

Think of an apartment inspection. A weak inspector walks through the rooms and says, “There are problems here.” A good inspector takes photos, notes the room, describes the damage, confirms open questions and separates suspicion from evidence. OpenHack tries to bring AI bug hunting closer to the second version: every hint gets a place, a reason and a decision.

A practical example

A team runs an internal web app with 140,000 lines of code, 38 API routes and three upload features. An OpenHack run could first collect routes, authentication boundaries and parser entry points. The router might then produce 18 scenarios: access without the right role, unsafe file extensions, missing size limits or injection paths.

An expert agent checks one upload scenario and finds a possible path traversal issue. That is not immediately published as a vulnerability. First, it becomes a finding candidate with evidence. Then a separate triage step decides whether the issue is accepted, downgraded, marked as a duplicate or rejected. Later, a security lead can reconstruct why 18 scenarios produced perhaps only two real findings.

Scope and limits

OpenHack does not replace experienced security reviewers. It structures the work, but a human still has to judge scope, risk and evidence.
Quality depends heavily on the harness, model, repository access and prompts. A bad run remains a bad run even if the files look tidy.
The approach fits source-code review better than pure black-box pentesting. Runtime behavior, product logic and production data can still be missing.

The sober takeaway is this: OpenHack does not make AI security work automatically true. It makes it easier to audit. For security-critical agents, that can be more valuable than another loud model claim.

SEO & GEO keywords

OpenHack, Hadrian, AI vulnerability research, source-guided security review, AI security, OWASP Top 10, MITRE ATT&CK, Semgrep, coding agents, Claude Code, Codex, Cursor

OpenHack makes AI bug hunting auditable instead of magical