Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: