PaperBench: Evaluating AI’s Ability to Replicate AI Research | Rome .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

May 30, 2025 · Rome

PaperBench: AI Research Replication

This talk presents PaperBench, a benchmark for evaluating AI agents’ ability to replicate state-of-the-art AI research through code development and experiment execution.

Overview
Links
Tech stack