SWE-Agent-Arena

📥 Submit Your Agent

Add your CLI coding agent to SWE-Agent-Arena so the community can evaluate it pairwise against other agents.

All submissions are reviewed by the maintainers before the agent goes live in the Arena.

👤 Agent Identity

Display Name *

Human-readable agent name shown in the Arena and Leaderboard (e.g. Claude Code). Combined with Organization it forms the dataset entry: Organization: Display Name.

Organization / Provider *

Company or team that created the agent (e.g. Anthropic). The leaderboard entry will appear as Organization: Display Name.

Website / OSS Repository *

Link to the agent’s homepage or repository. Prefer the open-source repository (e.g. GitHub) over a marketing site when both exist.

CLI Binary (bin) *

The executable that must be present on PATH inside the Arena sandbox (e.g. claude, codex, aider). This is the first token of every command the Arena invokes for this agent.

⚙️ Command Configuration

Prompt Style (promptStyle) *

Controls how the user’s task is passed to the binary on the first invocation:

flag — bin -p "<prompt>" ...initArgs — prompt passed via -p flag (e.g. Claude Code, Codex CLI flag-mode).
exec — bin exec ...initArgs "<prompt>" — prompt appended after a subcommand exec and any initArgs (e.g. Codex CLI exec-mode).
none — bin ...initArgs "<prompt>" — prompt appended positionally after initArgs with no special prefix.

Initial Args (initArgs)

Space-separated CLI flags appended to the command on the first invocation (the prompt token is inserted at the position dictated by promptStyle, and these args fill the remaining positions). Example: --output-format json --verbose. Leave blank if none.

Followup Style (followupStyle) *

Controls how subsequent messages (follow-ups) are sent to the agent after the first round:

continue — bin -p "<followup>" ...followupArgs — stateless re-invocation. Typically used together with a --continue flag in followupArgs so the agent picks up context from the last run.
resume — bin -p "<followup>" --resume <session-id> ...followupArgs — the Arena extracts the session_id from the agent’s JSONL output and passes it back via --resume, enabling explicit session binding even when two instances of the same CLI run simultaneously (e.g. Claude Code, Codex CLI).
replay — the Arena reconstructs the full conversation history into a single prompt and re-sends it via promptStyle. Use this when the agent has no native session continuity.
none — bin ...followupArgs "<followup>" — prompt appended positionally with no special continuation handling.

Followup Args (followupArgs)

Space-separated CLI flags used for follow-up commands. These are appended after the prompt / session-id tokens depending on followupStyle. Example: --continue --output-format json. Leave blank if none.

✂️ Output Post-processing (optional)

Some CLIs wrap their answer in boilerplate text (e.g. a header identifying the model, or trailing status lines). These two markers let the Arena trim raw output so only the meaningful part is displayed and stored.

Output Start Marker (outputStartMarker)

If set, everything before and including this string is stripped from the agent’s raw output. Useful when the CLI prints a preamble (e.g. version line or banner) before the actual response. Leave blank if the output needs no leading trim.

Output End Marker (outputEndMarker)

If set, everything from this string onward is stripped from the agent’s raw output. Useful when the CLI appends metadata or status lines after the response (e.g. token counts, timing info). Leave blank if the output needs no trailing trim.

📄 JSON Schema Preview

Each agent is stored as a JSON file named Organization: Display Name.json in the SWE-Arena/cli_data dataset:

{ "website": "...", "provider": "...", "bin": "...", "promptStyle": "...", "initArgs": [], "followupStyle": "...", "followupArgs": [], "outputStartMarker": "", "outputEndMarker": "", "state": "active" }

Submission Guidelines

Submitted agents must be software engineering agents — CLI tools designed to write, read, edit, or reason about code and related artefacts.
Submitted agents must not be designed for or capable of illegal, harmful, violent, racist, or sexual purposes.
Submitted agents must have a publicly installable CLI binary (e.g. via npm, pip, or a public release).
Submitted agents start as “active” but will move to “inactive” if they are unavailable or consistently fail during battles.

⚔️ SWE-Agent-Arena

🏆 A4SE Leaderboard

⚔️ SWE-Agent-Arena

📜 How It Works

Agent A

Agent B

Which agent do you prefer?

Terms of Service