Hacker Newsnew | past | comments | ask | show | jobs | submit | lmeyerov's commentslogin

We have this issue in GFQL right now. We wrote the first OSS GPU cypher query language impl, where we make a query plan of gpu-friendly collective operations... But today their steps are coordinated via the python, which has high constant overheads.

We are looking to shed something of the python<->c++<->GPU overheads by pushing macro steps out of python and into C++. However, it'd probably be way better to skip all the CPU<>GPU back-and-forth by coordinating the task queue in the GPU to beginwith . It's 2026 so ideally we can use modern tools and type as safety for this.

Note: I looked at the company's GitHub and didn't see any relevant oss, which changes the calculus for a team like our's. Sustainable infra is hard!


We are the maintainers of https://github.com/rust-gpu/rust-gpu and https://github.com/Rust-GPU/Rust-CUDA FWIW. We haven't upstreamed the VectorWare work yet as it is still being cleaned up and iterated on.

This is great work by Dawn Song 's team. A huge part of botsbench.com for comparing agents & models for investigation has been in protecting against this kind of thing. As AI & agents keep getting more effective & tenacious, some of the things we've had to add protections against:

- Contamination: AI models knowing the answers out of the gate b/c pretraining on the internet and everything big teams can afford to touch. At RSAC for example, we announced Anthropic's 4.6 series is the first frontier model to have serious training set contamination on Splunk BOTS.

- Sandboxing: Agents attacking the harness, as is done here - so run the agent in a sandbox, and keep the test harness's code & answerset outside

- Isolation: Frontier agent harnesses persist memory all over the place, where work done on one question might be used to accelerate the next. To protect against that, we do fresh sandboxing per question. This is a real feature for our work in unlocking long-horizon AI for investigations, so stay tuned for what's happening here :)

"You cannot improve what you cannot measure" - Lord Kelvin


Instead of scanning more code, afaict what you seem to want is instead, scan on the same small area, and compare on how many FPs are found there. A common measure here is what % of the reported issues got labeled as security issues and fixed. I don't see Mythos publishing on relative FP rate, so dunno how to compare those. Maybe something substantively changed?

At the same time, I'm not sure that really changes anything because I don't see a reason to believe attacks are constrained by the quality of source code vulnerability finding tools, at least for the last 10-15 years after open source fuzzing tools got a lot better, popular, and industrialized.

This might sound like a grumpy reply, but as someone on both sides here, it's easy to maintain two positions:

1. This stuff is great, and doing code reviews has been one of my favorite claude code use cases for a year now, including security review. It is both easier to use than traditional tools, and opens up higher-level analysis too.

2. Finding bugs in source code was sufficiently cheap already for attackers. They don't need the ease of use or high-level thing in practice, there's enough tooling out there that makes enough of these. Likewise, groups have already industrialized.

There's an element of vuln-pocalypse that may be coming with the ease of use going further than already happening with existing out-of-the-box blackbox & source code scanning tools . That's not really what I worry about though.

Scarier to me, instead, is what this does to today's reliance on human response. AI rapidly industrializes what how attackers escalate access and wedge in once they're in. Even without AI, that's been getting faster and more comprehensive, and with AI, the higher-level orchestration can get much more aggressive for much less capable people. So the steady stream of existing vulns & takeovers into much more industrialized escalations is what worries me more. As coordination keeps moving into machine speed, the current reliance on human response is becoming less and less of an option.


We find it true in Louie.ai evals (ai for investigations), about a 10-20% lift which meaningful. It'd measured here: botsbench.com .

Unfortunately, undesirable in practice due to people being token-constrained even before. One case is retrying only on failure, but even that is a bit tricky...


I've found value in architectural research before r&d tier projects like big changes to gfql, our oss gpu cypher implementation. It ends up multistage:

- deep research for papers, projects etc. I prefer ChatGPT Pro Deep Research here As it can quickly survey hundreds of sources for overall relevance

- deep dives into specific papers and projects, where an AI coding agent downloads relevant papers and projects for local analysis loops, performs technical breakdowns into essentially a markdown wiki, and then reduces over all of them into a findings report. Claude code is a bit nicer here because it supports parallel subagents well.

- iterative design phase where the agent iterates between the papers repos and our own project to refine suggestions and ideas

Fundamentally, this is both exciting, but also limiting: It's an example of 'Software Collapse' where we get to ensure best practices and good ideas from relevant communities, but the LLM is not doing the creativity here, just mashing up and helping pick.

Tools to automate the stuff seems nice. I'd expect it to be trained into the agents soon as it's not far from their existing capabilities already. Eg, 'iteratively optimize function foobar, prefer GPU literature for how.'


I'm not too familiar with etsy, but presumably most etsy sellers are closer to being lemonade stands than they are to being ikea

And yes, sometimes it's nice to support a local lemonade stand. For my family's income, I know which segment I'd feel more confident to work for..


Quality indie software in a niche that Ikea is not addressing can make a decent income unlike a lemonade stand.

And unlike at (this hypothetical) Ikea, you wouldn't have to maintain the impression of 20x AI-augmented output to avoid being fired. Well, you could still use AI as much as you want, but you wouldn't have to keep proving you're not underusing it.


Evals or GTFO

5x productivity boost in merged PRs (lots of open PR & merge rate goes down, but net positive)

Starting to build custom tooling around new "friction" points in dev cycle

(eng IC perspective)


Evals let us agree on the baseline, measurement, etc, and compare if simple things others do perform just as well. For same reason, instead of 'works on my box' and 'my coding style', use one of the many community evals vs making up your own benchmark.

That helps head off much of many of the unfalsifiable discussions & claims happening and moves everyone forward.


a rust version of that compiler (that the project runs on) ran at 480k claims/sec and it was able to deterministically resolve 83% of conflicts across 1 million concurrent agents (also 393,275x compression reduction @ 1m agents on input vs output, but different topics can make the compression vary)

natively claude (and other LLM) will resolve conflicting claims at about 51% rate (based on internal research)

the built in byzantine fault tolerance (again, in the compiler) is also pretty remarkable, it can correctly find the right answer even if 93% of the agents/data are malicious (with only 7% of agents/data telling us the correct information)

basically the idea here is if you want to build autonomous at scale, you need to be able to resolve disagreement at scale and this project does a pretty nice job at doing that


My question was on claims like "5x productivity boost in merged PRs (lots of open PR & merge rate goes down, but net positive)", eg, does this change anything on swe-bench or any other standard coding eval?

The ecosystem is 8 tools plus a claude code plugin, the unlock was composing those tools (I don't regularly use all 9). The 5x claim was from /insights (claude code)

Not for everyone, but it radically changed how I build. Senior engineer, 10+ years

Now it's trivial to run multiple projects in parallel across claude sessions (this was not really manageable before using wheat)

Genuinely don't remember the last time I opened a file locally


It sounds like the answer is "No, there is no repeatable eval of the core AI coding productivity claim, definitely not on one of the many AI coding benchmarks in the community used for understanding & comparison, and there will not be"

My data is from Anthropic

Not sure how it works under the hood, probably a better question for them

Perhaps you are misunderstanding the entire premise of this project, this is not an LLM


Maybe there's a fundamental miscommunication here of what evals are?

Evals apply not just to LLMs but to skills, prompts, tools, and most things changing the behavior of compound AI systems, and especially like the productivity claims being put forth in this thread.

The features in the post relate directly to heavily researched areas of agents that are regularly benchmarked and evaluated. They're not obscure, eg, another recent HN frontpage item benchmarked on research and planning.


your question makes sense, it's just not in current scope

we are still benchmarking the compiler at scale and the LLM tools that were made were created as functional prototypes to showcase a single example of the compiler's use case

since much of the unlock here is finding different applications for the compiler itself, we simply don't have the bandwidth to do much benchmarking on these projects on top of maintaining the repos themselves

all the code is open source and there is nothing stopping anyone from running their own benchmarks if they were curious

btw

https://news.ycombinator.com/item?id=47733217


Speaking of embeddable, we just announced cypher syntax for gfql, so the first OSS CPU/GPU cypher query engine you can use on dataframes

Typically used with scaleout DBs like databricks & splunk for analytical apps: security/fraud/event/social data analysis pipelines, ML+AI embedding & enrichment pipelines, etc. We originally built it for the compute-tier gap here to help Graphistry users making embeddable interactive GPU graph viz apps and dashboards and not wanting to add an external graph DB phase into their interactive analytics flows.

Single GPU can do 1B+ edges/s, no need for a DB install, and can work straight on your dataframes / apache arrow / parquet: https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...

We took a multilayer approach to the GPU & vectorization acceleration, including a more parallelism-friendly core algorithm. This makes fancy features pay-as-you-go vs dragging everything down as in most columnar engines that are appearing. Our vectorized core conforms to over half of TCK already, and we are working to add trickier bits on different layers now that flow is established.

The core GFQL engine has been in production for a year or two now with a lot of analyst teams around the world (NATO, banks, US gov, ...) because it is part of Graphistry. The open-source cypher support is us starting to make it easy for others to directly use as well, including LLMs :)


*legal in the US


Apple and Google are facilitating the data sales

Specifically, these big companies revenue share with app companies who in turn increase monetization via selling your private information, esp via free apps. In exchange for Apple etc super high app store rake percentage fees, they claim to run security vetting programs and ToS that vet who they do business with and tell users & courts that things are safe, even when they know they're not.

It's not rocket science for phone OS's to figure out who these companies are and, as iOS / android os users already get tracked by apple/google/etc, triangulate to which apps are participating


I'm game for throwing rocks at Apple and Google, but I don't get this one.

> consumer apps embed ad SDKs → those SDKs feed location signals into RTB ad exchanges → surveillance-oriented firms sit in the RTB pipeline and harvest bid request data even without winning auctions

Would you ban ad supported apps? Assuming the comment you're responding to is realistic, I'm not sure how the OS is to blame.


Neither big players have refined enough permissions. These set users up for giving away more data than they think.

Maybe one clear example is needing a permission once for setup and then it remaining persistent.

An easy demonstration is just looking at what Graphene has done. It's open source and you wana say Google can't protect their users better? Certainly Graphene has some advanced features but not everything can be dismissed so easily. Besides, just throw advanced features behind a hidden menu (which they already have!). There's no reason you can't many most users happy while also catering to power users (they'll always complain, but that's their job)

https://grapheneos.org/features


> Would you ban ad supported apps?

There's no need to ban ad supported apps when you can just ban the practice of using ads targeting users based on individual characteristics.


You trust the adtech companies to pinky promise to totally not do that anymore?


how about jailing CEO's of companies who do this?


I’m not sure that’s how corporate blame works. The ceo signed off on the CIOs proposal to streamline data analytics logs via WeTotallyWontSiphonOffYourDataAndSellIt incorporated for user improvement purposes, which happens to be owned by the CFO’s brother in law. How were the CIO and CEO to know that a third party was selling off the data, and how was that third party to know that the sale of the data to another party who then onsold the data to the fbi would be illegal?


> How were the CIO and CEO to know that a third party was selling off the data, and how was that third party to know that the sale of the data to another party who then onsold the data to the fbi would be illegal?

Ask yourself the same question about personal health data and the answer reveals itself: the CEO and CIO know (or should know) that the vendor needs to be HIPAA-compliant or it's their necks (the CEO's and CIO's), so they look for a vendor who advertises as being HIPAA-compliant.

Pass legislation to the same effect for all PII and the CEO and CIO will then make requirements of the vendor. If the vendor lies, they get fired because the company hiring them is culpable. The vendor may also be subject to civil and/or criminal penalties. It seems simple, other than the fact that we have a federal legislature with no apparent interest in solving this problem, alongside a populace which either doesn't notice or doesn't care about that.

To answer the question more pithily: communication.


> I’m not sure that’s how corporate blame works.

In regulated industries, like finance and taxation, regulators deliberately assign responsibility to individuals, so misconduct doesn’t get lost inside the company or within its corporate stakeholder network. That removes a lot of friction once you want to hold someone liable.

I've read our parents comment as an implicit proposal to establish similar structures in tech.


I would ban apps using unsafe ad platforms

If I was simultaneously also the owner of the ad platform, I'd fix it & knock out the bad players, or get ready to be sued for a decade+ of knowing malpractice

And if I was a US citizen seeing the companies being involved be sued for being monopolies and abusing their position, and then seeing them cry security in court yet knowingly do this for a decade+, I'd feel frustrated by successive left + right US administrations & voters


They are all unsafe. It’s a huge source of revenue for ad companies.


You can trace the big players

If Google & Apple & friends refused to take a rake and opened distribution, then I'd agree, net neutrality etc, not their problem

But they own so much, and so deep into the pipeline, and explain their fees to courts because "security"... and then don't do investigations. They employ some of the best security analysts in the world and have $10-30B/yr revenue tied to just the app store fees, so they very much can take a big bite out of this if they wanted.


  > They employ some of the best security analysts in the world and have $10-30B/yr revenue
I'll never not be impressed by how many people will defend trillion dollar organizations and say that things are too expensive. Especially when open source projects (including forks!) implement such features.

I'm completely with you, they could do these things if they wanted to. They have the money. They have the manpower. It is just a matter of priority. And we need to be honest, they're spending larger amounts on slop than actual fixes or even making their products better (for the user).


“Priorities” is far too soft a term in this context. These are anti-priorities: not just things they choose not to work on, but things they’ll spend big money to prevent, up to and including bribing, uh I mean lobbying, lawmakers.


This is really simple to explain:

Apple does not let you restrict app network access[1]

You have no ability to know who your app is connecting to, and you cannot select or prevent it.

[1] except maybe the cellular data toggle


Settings > Privacy & Security > App Privacy Report will at least show domains contacted by each app.


But you cannot block them.


The only way Im aware of is if you do it thru Settings > Cellular and always use data for internet on your phone


Ultimately the fact that ad sdks have such wide access to location information is a choice by the platforms. I've long wanted meaningful process isolation between the app and its ad sdks, but right now there's oodles of them that just squat on location data when the app requests it.


Apple supposedly does this with the privacy report cards.

However, I'd be shocked if a cursory audit comparing SDKs embedded in apps and disclosed data sales showed they were effectively enforcing anything at all.


> Would you ban ad supported apps?

Yes, I absolutely would. Advertisements are a scourge upon people's wellbeing on top of being ugly and intrusive.

If you want to build a free product, that's great. Build a free product.

If you want to make money from your product, then charge for your product.


>Yes, I absolutely would.

And then you will get fired by the end of day.


Luckily I don't work for an ad-supported business.


How did your company and its customers find each other?


Do people really still think advertising has a legitimate function?

Really these days it's 95% psychological manipulation to get people to buy inferior quality stuff they don't need. And 5% of people actually finding what they're looking for.

Don't forget, most advertising can work fine in a "pull" mode. I need something so I go out and look for it. These days something like Google (not ideal because results also manipulated by the highest bidder). Or I look for dedicated forums or a subreddit for real people's experiences. In the old days it would have been yellow pages or ask a friend.


> I'm not sure how the OS is to blame.

Read the TOS.


If I have a free app that hits location services on the device and I sell this data, how does Apple and Google make money from me?


Apple doesn't even allow apps to know whose device they are running on without the user's explicit opt-in permission.

Just as importantly, apps aren't allowed to remove functionality if the user says no.

You need additional permissions to do things like access location data or scan local networks for device fingerprinting.


And Facebook/Meta. Their trackers are everywhere.


It's everyone. Especially google, but all the big tech companies play in the same pool. Amazon, Google, Apple, Meta etc make money selling ads, which ultimate enables the tools that result data harvesting from everyone across the internet. I wrote a little data investigation [1] (mostly finished) that show cases how every major news organization across the globe I scanned had some level of data collection integrated. This is just one industry, but its important (as it connects back to the incentives these media organizations have, which is to make money by selling ads at any cost). The eff also released an angle in how the bidding process to buy ads is itself a massive privacy nightmare[2]

[1] https://quickthoughts.ca/autotracko/ [2] https://www.eff.org/deeplinks/2026/03/targeted-advertising-g...


cloudflare is more everywhere than facebook


Yeah, but unlike facebook, they weren't just caught making videos of people having sex then paying people to watch the videos.

Also, unlike facebook, they also weren't just caught running a dark money lobbyist network with the goal of forcing more collection of minors' private information.


facebook is evil for many different reasons, but for a government looking to spy on its own citizens cloudflare is much more attractive target. That said, I have no doubt that they're collecting copious amounts of data from both companies, either by sale or by force.


Not Experian, TransUnion, and Equifax?

Or for location, the cellular providers?


There are plenty of bad actors

The interesting part is Google & Apple, as part of explaining to courts why their large app store fees are legit and not proof of monopoly positions, hid behind the security argument that they need to be the clearing house of what software runs on the devices. Except... they've knowingly punted on this one for 10+ years.

I would 100% agree that losing privacy through any utility-level carrier (credit cards, phone, OS provider, etc) should be default disallowed, and any opt-ins have a clear transparency mode with easy opt-out. At least two areas the US can learn from the EU on digital policy is digital marketplaces and consumer privacy protection, and this topic is at the intersection of both.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: