An old trick for bytecode interpreters is to use indirect jumps for going to the next opcode’s implementation (gcc had a ‘computed goto’ extension that was used for this. For Rust I guess you would want function pointers plus something to force TCE?) and putting those indirect jumps at the prelude of every opcode implementation. This meant that the indirect-jump-predictor (which cpus have because of oop) would have separate models for the ends of different opcodes and so would be more likely to predict correctly (maybe you have a hard time predicting a the next instruction but it is much more likely that a test is followed by a branch).
I think other tricks (e.g. storing top of stack in registers for a stack machine) matter more though, and I don’t know if the trick described above is still relevant.
Just to add some jargon for folks who want to do more research into this:
The technique of having the interpreter dispatch duplicated in the postlude of each handler function is called "threading". If your dispatch uses an indirect jump, it's called "indirect threading". The indirect-jump-predictor is known as branch target predictor (BTB).
Yeah I've spent time on this topic before when working in C++, but haven't invested much energy in thinking about how one would do it efficiently in Rust yet. Luckily performance is not my #1 concern, at least not yet. And maybe if I got down that road more I'd just invest in going JIT with cranelift or similar.
I think other tricks (e.g. storing top of stack in registers for a stack machine) matter more though, and I don’t know if the trick described above is still relevant.