You are ignoring that even single threaded code has a lot of sources of concurrency and side-effects; memory/caches, branch prediction, prefetching...
Watch the talk and look at the examples. Most of them are single-threaded; a bad hash-function causing bucket collisions and linear inserts; SQLite using an indirection table, killing speculative execution and code prefetching.
Those two wouldn't really show up in a sampling profiler, because they still take up a tiny amount of time.
Sampling profilers show you where time is spend, Causal profilers show you what performance side effects every line of code has.
Watch the talk and look at the examples. Most of them are single-threaded; a bad hash-function causing bucket collisions and linear inserts; SQLite using an indirection table, killing speculative execution and code prefetching.
Those two wouldn't really show up in a sampling profiler, because they still take up a tiny amount of time.
Sampling profilers show you where time is spend, Causal profilers show you what performance side effects every line of code has.