avabuildsdata's comments

avabuildsdata · 2026-03-15T22:04:43 1773612283

yeah this is the part that got me excited honestly. we're not google-scale by any stretch but we have ~8 internal Go modules and deprecating old helper functions is always this awkward dance of "please update your imports" in slack for weeks. even if it doesn't let you delete the function immediately for external consumers, having the tooling nudge internal callers toward the replacement automatically is huge. way better than grep + manual PRs

shoo · 2026-03-15T22:23:24 1773613404

it could be better than a nudge -- if you could get a mandatory `go fix` call into internal teams' CI pipelines that either fixes in place (perhaps risky) or fails the build if code isn't already identical to fixed code.

avabuildsdata · 2026-03-15T19:06:15 1773601575

fwiw I work on data ingestion pipelines and I've found that starting with just boxes-and-arrows in something like Excalidraw gets you 80% of the way to knowing what you actually want. The gap between "I can picture it" and "I can build it on a webpage" is mostly a d3 learning curve problem, not a design problem.

xyflow that the creator mentioned is probably the right call for pipeline DAGs though -- we use it internally for visualizing our scraping workflows and it was surprisingly painless to get running

avabuildsdata · 2026-03-15T18:04:54 1773597894

the Go choice makes a lot of sense for this. i've been wiring up government data sources for a different project and honestly the format inconsistency between agencies is always the real headache, not the actual processing

curious about the 23 tools though -- are those all invoked through one Gemini orchestration pass or is there a routing layer picking which subset to call per detection? feels like that'd stack up fast latency-wise

mapldx · 2026-03-15T19:12:10 1773601930

Not all 23 get invoked in one pass. The system runs 4 different types of cycles, each with its own Gemini call, and within a cycle the model picks a subset of tools based on the context rather than fanning out to everything.

Over the last week, the median ends up being about 6 tool calls across 4 distinct tools per cycle.

Latency-wise, median completed cycle time is about 37s overall. The heavy path is FIRMS: about 135s median / 265s p90 over the same window.

It runs asynchronously in the background, so the web UI isn’t blocked on a cycle finishing, though cycle latency still affects how quickly new detections get enriched.

avabuildsdata · 2026-03-15T17:04:42 1773594282

honestly the thing that trips me up is when codegen makes me feel productive but I haven't actually validated anything. like I'll have claude write a whole data pipeline in 20 minutes and then spend 2 hours debugging edge cases it didn't think about because it doesn't know our data

the speed is real but it mostly just moves where I spend my time. less typing, more reading and testing. which is... fine? but it's not the 10x thing people keep claiming

nubg · 2026-03-15T17:08:49 1773594529

Would getting to the same edge-case-free outcome have taken you less than 2h20min if you didn't have AI?

I think it would typically have taken you longer.

lkjdsklf · 2026-03-15T17:42:26 1773596546

> I think it would typically have taken you longer.

That's actually highly doubtful to me.

Tons of studies and writing about how reading and debugging code is wildly more time consuming than writing it. That time goes up even more when you're not the one that wrote the code in the first place. It's why we've spent decades on how to write readable/maintainable code.

So either all this shit about reading/maintaining code being difficult was lies and we've spent decades wasting our time or AIs can only improve productivity if you stop verifying/debugging code.

So I find it very unlikely that it would have taken more than a couple hours to just write it the first time.

avabuildsdata · 2026-03-13T10:14:38 1773396878

The unfair scheduling point resonates. I run a lot of concurrent HTTP workloads in Go (scraping, data pipelines) and the scheduler is honestly fine for throughput-oriented work where you don't care about tail latency. But the moment you need consistent response times under load it becomes a real problem. GOMAXPROCS tuning and runtime.LockOSThread help in narrow cases but they're band-aids. The lack of priority or fairness knobs is a deliberate design choice but it does push certain workloads toward other runtimes.

valyala · 2026-03-17T12:38:46 1773751126

If the server cannot keep up with the given workload because of some bottleneck (CPU, network, disk IO), then it cannot guarantee any response times - incoming queries will be either rejected or queued in a long wait queue, which will lead to awfully big response times. This doesn't depend on the programming language or the framework the server written in.

If you want response time guarantees, make sure the server has enough free resources for processing the given workload.