The most common modern cases I've run across are handwritten inner loops that make careful use of SIMD instructions, since auto-vectorization isn't quite there yet. For example, the x264 encoder has a lot of assembly in it. In x264's case they even wrote their own assembly IR, though targeted only at x86 variants: http://x264dev.multimedia.cx/archives/191