commitbench · a coding-agent benchmark
Are you using the right coding agent?
Point it at any GitHub repository. commitbench replays real commits, has each agent reproduce the change, and scores the result against what actually shipped.
def greet(name): - return "hi " + name + return "hello, " + name.title() + "!"