Looks like LLMs also find Dafny easier to write than Lean. This study, “A benchm... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cpeterso 15 days ago \| parent \| context \| favorite \| on: When AI writes the software, who verifies it? Looks like LLMs also find Dafny easier to write than Lean. This study, “A benchmark for vericoding: formally verified program synthesis”, reports: > We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications … We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs. https://arxiv.org/html/2509.22908v1

nextos 15 days ago [–]

Not surprising, as Dafny is a bit less expressive (refinement instead of dependent types) and therefore easier to write. IMHO, it hits a very nice sweet spot. The disadvantage of Dafny is the lack of manual tactics to prove things when SAT/SMT automation fails. But this is getting fixed.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact