AI_Commercialization--Product-Management-skills

Leaderboard

This page is the public scoreboard for PM Agent Benchmark runs.

It should stay small and credible.

The first visible baseline has now been published.

It is a self-run baseline, not an independent external benchmark.

That is acceptable for a first public seed, as long as the limitation stays explicit.

This page becomes materially more persuasive in this order:

Right now this page is still at step 1. That is enough to start the category. It is not enough to claim benchmark leadership yet.

Date	Platform	Model	Adapter	Cases	Routing	Output	Total	Notes
2026-04-18	Codex App	GPT-5 Codex runtime	`AGENTS.md` + `agent/`	4	12 / 12	25 / 28	37 / 40	self-run baseline; sparse-context can still structure too early

Use the pack in First Public Run.

The first run should cover:

Do not compare totals alone.

Always compare:

The next useful row is not “another self-congratulatory score.”

It is one of these: