Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I seem to remember doing it in SQL (EDIT_DISTANCE) 20ish years ago. While I wouldn't say it worked beautifully, I also didn't need to make a single line of Rust :) also no more than 2 line s of SQL were needed.
 help



Edit_distance uses pure levenstein which is quadratic, so for tables of 500k rows and 20+ columns each it will slowdown to a crawl. Without going into a lot of detail, I needed this to work for datasets of that size. So a lot of "trick" optimization and pre-processing has to be done.

Otherwise simple merges in pandas or sql/duckdb would had sufficed.


And how many years of experience you needed to know what to write, and what if you can replace that time with how long prompting takes?

It's an interesting question.

Years of school (reading, calculus etc) to get to the point of learning basics of set theory. One day to learn basic SQL based on understanding the set theory. Maybe few weeks of using SQL at work for ad hoc queries to be proficient enough (the query itself wasn't really complex).

For the domain itself I was consulting experts to see what matters.

I'm not sure that time it would take to know what to prompt and verify the results is much different.

Fun fact - management decided that SQL solution wasn't enerprisely enough so they hired external consultants to build a system doing essentialy that but in Java + formed an 8 people internal team to guide them. I heard they finished 2 years later with a lot of manual matching.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: