That's not a very convincing argument. "I use it because it works" without refut...

trgn · on Nov 14, 2022

There's a somewhat tangentially related heuristic to your points imho. When writing code, when in doubt, task CPU over memory, it will generally yield leaner, better software. Easier to profile, debug, and optimize. Much easier to scale in production environments too.

Code which optimizes for fast running time, tends to lean heavily into using memory-bloating data structures and caches. These are often poison for aggregate performance in real world environments, and like you said among other things, reduce visibility in the workings of system.

There's some implications of this stance; e.g. optimizations to reduce memory footprint are generally never wasted, while optimizations to squeeze out a few ns of runtime may have many more unexpected consequences.

cogman10 · on Nov 14, 2022

The issue with caches is they are easy to apply. You can easily drop a cache in any place you load a resource without significant code changes.

I tend to think they are ok weapons of sometimes first resort (rarely changing resource with heavy system impact). But you'll definitely get better results simply by rethinking and rearchitecting stuff. (Did you really need to load the database just to lookup a single value?)

hinkley · on Nov 14, 2022

> You can easily drop a cache in any place you load a resource without significant code changes.

Only when the cache is brand new. The problem with people with cache addiction is that they think they're free. So once they know a cache is there, they begin to lean on it, and lean on it. I've seen it with many different teams in a number of verticals. Sometimes while I'm there, sometimes before I arrived. It's the people not the company.

This illusion that you can always turn it off later is dangerous. That's only true if you have an impeccable information architecture when you start. And if you have that, you've already done a lot of heavy lifting before caching becomes your only option.

ilyt · on Nov 14, 2022

> How do you teach people to effectively use a profiler on code with pernicious caching?

Profile the function that is called if object is not found in cache ? Have pluggable cache with plugin that just doesn't cache? That is not hard problem to solve.

> They get really uncomfortable six months later when I'm reporting 2-3x in code that can't be optimized further. And in each case everything was made more difficult by the people ahead of me chumming the waters

Well, they delivered (I'm guessing) 10x improvement in a month by adding cache. Caching is entirely reasonable first step and it can be good enough to be last one.

And frankly ability to use profiler and knowing some architectural optimizations seems to be pretty rare skill.

TeMPOraL · on Nov 14, 2022

> Profile the function that is called if object is not found in cache ? Have pluggable cache with plugin that just doesn't cache? That is not hard problem to solve.

The more the dynamic behavior of the profiled system and the real system deviate, the harder it is to isolate what's causing the slowdowns.

> Well, they delivered (I'm guessing) 10x improvement in a month by adding cache.

Yes, but by doing this they also tremendously increased the complexity of the project, and (unless the team has someone like GP in it) prevented further optimizations. It may still be a good trade-off, but the overall sorry state of pretty much all software everywhere suggests it may not be a good one usually.

> And frankly ability to use profiler and knowing some architectural optimizations seems to be pretty rare skill.

That's kind of reinforcing the parent's point, though.

ilyt · on Nov 15, 2022

>> Profile the function that is called if object is not found in cache ? Have pluggable cache with plugin that just doesn't cache? That is not hard problem to solve.

>The more the dynamic behavior of the profiled system and the real system deviate, the harder it is to isolate what's causing the slowdowns.

Whose quote you repeat without thinking ? It's not that relevant if you know how profiling works...

>> Well, they delivered (I'm guessing) 10x improvement in a month by adding cache.

>Yes, but by doing this they also tremendously increased the complexity of the project

Not really in most cases its pretty simple to add it, and if you profiled correctly you'd start with only adding extra code in hottest part of the app

> and (unless the team has someone like GP in it) prevented further optimizations.

Uh, just no. It might've fixed problem well enough to not be touched but it ain't stopping anything, just a different solution.

> It may still be a good trade-off, but the overall sorry state of pretty much all software everywhere suggests it may not be a good one usually.

The assumption here that the same developers that fucked up caching would somehow manage to achieve same speedup with "just" optimizations and less bugs is bold one.

Hell, many low level optimizations are also basically "using cache" (but CPU one) well, althought compiler usually gets you out of most egregious mistakes (unless you decide to write multithreaded C/C++ carelessly I guess.

>> And frankly ability to use profiler and knowing some architectural optimizations seems to be pretty rare skill.

> That's kind of reinforcing the parent's point, though.

What point exactly ? That average developer is bad at their job ? Poster above explicitly said that the worse developers claimed it can't be optimized more in the first place, what else you expect from them ? Cache was the most cost-efficient solution most likely.

zasdffaa · on Nov 14, 2022

> The more the dynamic behavior of the profiled system and the real system deviate

Maybe. I've never found it a problem, but maybe. When I've added a cache, the cache comes with instrumentation for exactly that.

> Yes, but by doing this they also tremendously increased the complexity of the project

It depends. EG if you don't need to worry too much about cache invalidation, adding caching can be as simple as adding a single function call eg like python's lru_cache. It depends on the use case.

hinkley · on Nov 14, 2022

> Profile the function that is called if object is not found in cache ? Have pluggable cache with plugin that just doesn't cache? That is not hard problem to solve

Okay, now I have a profile that says 90% of the time is spent in the code the cache avoids. Now what? Narrator's note: When the cache was put in place it accounted for 40% of run time, but now people use it because it's 'free' and so the code has grown from looking up the data twice to looking it up five to seven times depending on conditional logic.

> Well, they delivered (I'm guessing) 10x improvement in a month by adding cache.

No, they didn't. That's the problem. No cache gets you 10x. 10x for one step of a 20 step process is a positive improvement but rarely more than 2x for the first one, 4x at the outside. If it does more that that then someone is being titanically stupid. From then on each cache is going to net you less, and less, and less. Only if someone measured wrong or you are blocked by some architecture change that's needed so that caches even make sense, will you see a bigger improvement this year than you did last.

But the pathology I'm discussing above didn't even have anything at all to do with caching. It's the psychology of (ir)rationalizing around performance, the assertion that you've done everything that can be done because you've done everything you know to do. It's a bizarre love child of learned helplessness and ego self defense. It's gotten way worse in a world where people reach for caches first (and thus, I assert, last), but it was always there.

I have a couple coworkers who are trying to be clever with a new cache architecture. Irritatingly by taking a statement of fact made by me 3+ years ago and jumping to the wrong conclusion. They promised 50% in 18 months (I was suggesting we target 90% in three years). Now we are somewhere past 30 months and so far they've delivered lower numbers than I did in 5 man weeks spread out from Christmas to mid March, while juggling other priorities and suffering from Seasonal Affective Disorder. Another coworker, usually cache averse, was making noises about adding a cache for some other pain point, and I started pulling on the thread for a bit until he was satisfied. It'll be next March before they surpass the 20% I turned in with about 8 modest changes. And the architectural improvement from both efforts will be roughly on a par. Top down is sexy, bottom up is not. But it's hard to do the former without the latter. Eat your vegetables.

ilyt · on Nov 15, 2022

>> Well, they delivered (I'm guessing) 10x improvement in a month by adding cache

>No, they didn't. That's the problem. No cache gets you 10x.

I was actually being conservative. We had almost 100x increase in some cases.

It was pretty particular app, site took few hundred (IIRC closer to second mark) ms to generate before we started at all.

First it was simple Varnish cache but that had problems, some components never changed, other were updated pretty often (stuff like "current users here")

First it was split to various components, each with different TTLs, then all of that was merged in Varnish ESI.

So you could say invalidate a part of the site and varnish would

* load this component from app server

* load the template it contains from its own cache

* render it.

So

* no refresh would be just direct from cache, single digit ms or lower latency

* refreshing one part of site would have latency of rendering only that tiny part of site on app server

later we also updated that with "grace" feature of Varnish where you could define time interval where cache would be valid but generate request to the backend to refresh

So if said low TTL component was refreshed often

* the component will always go from cache with lowest latency

* if it was on its last seconds varnish would send the request in the background and refresh cache

Now the app itself wasn't very well written in the first place (there is a book of horror stories to write about it) but they (we did ops and testing, they did the app stuff) spent less time than "fixing it properly" and instead rewrote it to a bit more modern architecture. They still use same cache to cache JSON responses to API endpoints tho.

That not even to say how much of DB performance is actually clever caching, after all index is just pre-generated partial cache...

But sure, some apps are better for it than other, 10x is definitely not universal.

> From then on each cache is going to net you less, and less, and less. Only if someone measured wrong or you are blocked by some architecture change that's needed so that caches even make sense, will you see a bigger improvement this year than you did last.

Well, doh, caching something you cached already ain't gonna be that effective. But if you see same request with same done 100 times in a minute, making it take 20ms instead of 100m ain't gonna be as effective as making it take 100us to get from cache. You might want to address it eventually to have lower jitter, but same cache can allow you to make 20 other methods faster on average and that is less code and effort than optimizing 20 other functions..

> I have a couple coworkers who are trying to be clever with a new cache architecture. Irritatingly by taking a statement of fact made by me 3+ years ago and jumping to the wrong conclusion. They promised 50% in 18 months (I was suggesting we target 90% in three years). Now we are somewhere past 30 months and so far they've delivered lower numbers than I did in 5 man weeks spread out from Christmas to mid March, while juggling other priorities and suffering from Seasonal Affective Disorder.

And do you think they would be able to instead go and optimize it better in that time ? That just sounds like incompetent co-worker and not anything to do with cache...

zasdffaa · on Nov 15, 2022

That guy just keeps repeating "it's hard" then 'explaining' why in obtuse terms and in ways that don't stand up (I don't think he understands database behaviour). Frankly I don't think he has much experience in this area. I'm going to give up now, he's spent a fair bit of your and my time and still can't accept he's not right.

zasdffaa · on Nov 15, 2022

> No cache gets you 10x

I've done much better. Think sticking a a cache in front of a database with seriously complex queries. Yes you need to be be aware of staleness, no it's not a magic wand, but it delivered massive improvements.

> When the cache was put in place it accounted for 40% of run time

If looking shit up in a hash table, at a few hundred or thousand cpu ticks, cost 40% of your runtime then I smell PEBKAC[1], not cache problems.

[1] Not you personally but someone

zasdffaa · on Nov 14, 2022

Exactly! People here seem to be making a complex thing out of a simple thing.

hinkley · on Nov 15, 2022

What makes caching one of the hardest parts of computing is this assertion that it's easy. Everything is easy when you oversimplify it. Some things explode in complexity when you try to force them to be simple.

andrekandre · on Nov 15, 2022

  > Some things explode in complexity when you try to force them to be simple.

do you mean to say iow "simple in the small but when assembled (in the large) is more complex" whereas what we should be aiming for is perhaps "a little more complex in the small but with the benefit that it enables a simpler (and probably more scalable) architecture once assembled (in the large)"??

Aeolun · on Nov 14, 2022

I kinda feel that your convincing counterargument is just ‘I have a different experience’.

I’ve never seen people reach for caches first, and then discard all further optimizations.

Or rather, I’ve seen them do that, but not out of some misplaced feeling that it was the only solution, just the one that was easiest to accomplish in very little time.

zasdffaa · on Nov 14, 2022

It's dificult to refute your argument because I don't understand your points:

> Because of the locality aspect, people will assume that it's 'free' to look up a value multiple times instead of passing these data dependencies between the relevant methods

what does that even mean? What does 'data dependencies' mean specicially?

> then you can end up evicting data in the middle of a transaction

So in the middle of a transaction, instead of looking up data from a key - very cheap - you should do ... what? Recalculate it instead?

> The less obvious consequence is now you have concurrency bugs, because the code assumes that decisions made about the data will go the same way on each call, but due to eviction, two halves of the transaction may make conflicting decisions with undefined behavior

Assuming I even understand this, you are saying where there's concurrencty there's a risk of pulling stale data out of a cache. Well, true. That's just something you have to be careful of. Add a Tx reference to the cache key.

> Once you introduce caching you poison the performance well.

Not my experience at all. I can't understand why you're saying this.

> Flame charts lie when cache is used

How so?

> and after people rely on the cache it is lying in the other direction when the cache is off, because you're calling that path 5 times now and it's swamping other bottlenecks

I've no idea what this means

> This is what I really mean when I say it's a weapon of last resort. Because once you use it, it attempts to take any other options off the table

Heh! Absolutely not! Other optimisations are orthogonal to caching IME:

Better algo - orthogonal

Go lower level - orthogonal

Go multicore - orthogonal

Noobish SQL query is begging for a rewrite - orthogonal

Use L1/2/3/ chip cache better, or RAM, by careful data layout - orthogonal

Use a better library - orthogonal

Use better data structure eg. than linked list - orthogonal

svieira · on Nov 14, 2022

>> Because of the locality aspect, people will assume that it's 'free' to look up a value multiple times instead of passing these data dependencies between the relevant methods

> what does that even mean? What does 'data dependencies' mean specicially?

It means that instead of having a method like `entryPoint(int)` that looks up a `Foo` by id and then passes the `Foo` around (`verify(Foo)`, `Foo update(Foo)`) you instead wind up with a system that passes around an `int` (`verify(int)`, `/* Unused */ Foo update(int)`) and each layer then looks up `Foo` in the `FooService` (because `FooService` has a cache and subsequent lookups should be essentially-free).

Even worse when `FooService` is abstracted behind a thread-local like `ThreadLocal<Foo> currentFoo`, the implementation of which is `fooService.findById(currentFooId.get()) /* cheap, because fooService has a cache */`.

My own take on this is that caches are global-by-default when you really want local-by-default (e. g. dynamic programming instead of a `static final Cache theCacheForTheEntireProgram`). Local-by-default caches are easy to make transactional (when the call stack returns the cache is gone) and don't grow in an unbounded manner in the general case.

hinkley · on Nov 14, 2022

Yes!

Local by default also has the feature/problem of making you look at the scale of the data involved in your query up front, before it's having a 100-1000 req/s hitting it in production. It's harder to pretend this large dependency graph will just magically take care of itself (or worse, be someone else's problem after you've declared victory) when it's staring you in the face.

throwaway74829 · on Nov 15, 2022

Interested in any resources that you've found vital in your journey.

I'm angling to go deeper into the "Cleaner" path, and it's a very slow "figure it out as you go" experience.