I'm disappointed that both the article and comments don't go into the actual differences between how these adapters work and the overhead incurred by USB.
At a high level, I'm pretty sure Thunderbolt will be significantly better in all situations:
Thunderbolt is PCIe; depending on the way the network card driver works, the PCIe controller will usually end up doing DMA straight into the buffers the SKB points to, and with io_uring or AF_XDP, these buffers can even be sent down into user space without ever being copied. Also, usually these drivers can take advantage of multiple txqueues and rxqueues (for example, per core or per stream) since they can allocate whatever memory they want for the NIC to write into.
USB is USB; the controller can DMA USB packet data into URBs but they need to be set up for each transaction, and once the data arrives, it's encapsulated in NCM or some other USB format and the kernel usually has to copy or move the frames to get SKBs. The whole thing is sort of fundamentally pull based rather than push based.
But, this is just scratching the surface; I'm sure there are neat tricks that some USB 3.2 NIC drivers can do to reduce overhead and I'd love to read an article where I learned more about that, or even saw some benchmarks that analyzed especially memory controller utilization, kernel CPU time, and performance counters (like cache utilization). Especially at 10G and beyond, a lot of processing becomes memory bandwidth limited and the difference can be extremely significant.
ACK. From some cursory experimentation, my laptop can roughly saturate 1G via USB, but on 2.5G things get wonky above roughly 1.9G unidirectional or 2.9G bidirectional.
> Thunderbolt is PCIe
Nit: Thunderbolt isn't PCIe, it tunnels PCIe. Depending on chips used, there's bandwidth limits; I vaguely remember 22.5G on older 40G TB Intel chips.
Thunderbolt allows PCIe tunneling, but it has some overhead over raw PCIe. That's why Thunderbolt eGPU setups don't perform as well as plugging the GPU directly into a PCIe slot.
> USB is USB
Until you get to USB4, when USB 4 supports Thunderbolt 4.
Fair; I should have said "from the standpoint of the driver."
> USB 4 supports Thunderbolt 4
It's the opposite! I hate to get into it as I saw the USB naming argument pretty thoroughly enumerated in the comments here already, but the pedantic interpretation is "Thunderbolt 4 is a superset of USB4 which requires implementation of the USB4 PCIe tunneling protocol which is an evolution of the Thunderbolt 3 PCIe tunneling protocol."
From the standpoint of USB-IF a "USB4" host doesn't need to support PCIe tunneling, but Microsoft also (wisely, IMO) put a wrench into this classic USB confusion nightmare by requiring "USB4" ports to support PCIe tunneling for Windows Logo.
> That's why Thunderbolt eGPU setups don't perform as well as plugging the GPU directly into a PCIe slot.
The bigger factor is probably that PCI-e tunnelling at most a ×4 link, while when you plug a GPU in you are generally doing so into a ×16 or at least ×8 slot, and very few GPUs target ×4.
At a high level, I'm pretty sure Thunderbolt will be significantly better in all situations:
Thunderbolt is PCIe; depending on the way the network card driver works, the PCIe controller will usually end up doing DMA straight into the buffers the SKB points to, and with io_uring or AF_XDP, these buffers can even be sent down into user space without ever being copied. Also, usually these drivers can take advantage of multiple txqueues and rxqueues (for example, per core or per stream) since they can allocate whatever memory they want for the NIC to write into.
USB is USB; the controller can DMA USB packet data into URBs but they need to be set up for each transaction, and once the data arrives, it's encapsulated in NCM or some other USB format and the kernel usually has to copy or move the frames to get SKBs. The whole thing is sort of fundamentally pull based rather than push based.
But, this is just scratching the surface; I'm sure there are neat tricks that some USB 3.2 NIC drivers can do to reduce overhead and I'd love to read an article where I learned more about that, or even saw some benchmarks that analyzed especially memory controller utilization, kernel CPU time, and performance counters (like cache utilization). Especially at 10G and beyond, a lot of processing becomes memory bandwidth limited and the difference can be extremely significant.