Hacker Newsnew | past | comments | ask | show | jobs | submit | peterweisz's commentslogin

A Vibe Coder’s Tale

As there are so many claims out there about agents that are built in 10 minutes, 2 hours, or 2 weeks, here is the truth of what it takes to build something solid, at least according to our test users.

We started out 14 months ago here to change the way funding decisions are taken by VCs, PE, Business Angels, and Accelerator programs because, as serial entrepreneurs, we felt that there was a lot of bias involved, which has had a negative effect on the asset class as a whole.

So, in the beginning, we teamed up with a fellow Accelerator alumni who claimed to be an experienced AI developer and put a lot of trust into his capabilities. After 8 months of back and forth, endless excuses, and iterations without improvement, we received an MVP that was too poor to mention, not at all according to mutually agreed specifications, full of drift and hallucinations, non-reliable scoring, and a word count that exceeded the pitch decks it was supposed to analyse.

So.........another painful parting-of-ways and still no product.

In the light of super-confident claims by ChatGPT that it could help me to develop it on my own despite not being an experienced coder myself, I set out on the journey with OpenAI. At about the same time, I thought that I should probably gain theoretical knowledge about generative AI and enrolled in a Johns Hopkins University postgraduate course.

The former was a disappointment, the latter a game changer.

ChatGPT 5 simply does not have the memory span to keep context of a complex, deterministic agent. So I went to Gemini 3 with which I started to keep local copies of my scripts and keep a record of session documentations. But also that model, despite loading docus ahead of each development milestone, sent me from one post-mortem report to another. Things became even more complex when I started to apply guardrails, such as LangChain, RAG, or MCP, that I had learned in my course.

As I became more experienced with the Python code the model blurred out, someone on YouTube told me to give it a try with Claude’s Opus 4.5 and that turned out to be my salvation.

I will not engage in any advertisement for Anthropic (also they really don't need me for that), but let me tell you this:

if you put your mind to prompt engineering,

if you put an emphasis on keeping local copies of scripts and documentation (do not use Claude’s projects),

if you build a solid foundation of guardrails on how deep research should reach and what the definitions of judged decisions are,you will make it far.

Never allow it to deploy from its own sandbox, challenge it with an LLM-as-a-judge and an MCP layer, force it to work in mode absolute, and you may develop something that exceeds your own expectations.


USS Tripoli will solve this

Great article. I'd recommmend to make guardrails and benchmarking an integral part of prompt engineering. Think of it as kind of a system prompt to your Opus 4.6 architect: LangChain, RAG, LLm-as-a-judge, MCP. When I think about benchmarks I always ask it to research for external DB or other ressources as a referencing guardrail

Would be grateful for a pointer on how to sign up to this.



The first link looks very suspicious


Appears to be where the actual link, http://partnerportal.anthropic.com/s/partner-registration, redirects. Site.com is some Salesforce related domain.


Huh, so you got http; I'm now getting linked to: https://partnerportal.anthropic.com/s/partner-registration

Which Firefox warns me has an untrusted cert.


Classic vibe coding, everyone involved in AI has blinders when it comes to their dogfood.

Yes, that’s why I linked where I found it. Anyone suspicious can click through to it from the anthropic.com page. It’s the correct link though.


This appears to have McKinsey's brand ID.


what is a brand ID?

Brand identity. In retrospect, a shibboleth, I incorrectly assumed to be in wider knowledge as I spent most my early career making and selling them. If curious, the original idea is closer to systems engineering than marketing.

Probably the canonical examples:

https://www.nasa.gov/wp-content/uploads/2015/01/nasa_graphic...

https://dn720005.ca.archive.org/0/items/nycta-gs-manual/NYCT...


Would be happy to utilize but didn't see a promocode or voucher. Do I need more coffee?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: