The size of the die space and the amount of the power have very little to do with AVX-512.
AVX-512 is a better instruction set and many tasks can be done with fewer instructions than when using AVX or SSE, resulting in a lower energy consumption, even at the same data width.
The increase in size and power consumption is due almost entirely to the fact that AVX-512 has both twice the number of registers and double-width registers in comparison with AVX. Moreover the current implementations have a corresponding widening of the execution units and datapaths.
If SSE or AVX would have been widened for higher performance, they would have had the same increases in size and power, but they would have remained less efficient instruction sets.
Even in the worst AVX-512 implementation, in Skylake Server, doing any computation in AVX-512 mode reduces a lot the energy consumption.
The problem with AVX-512 in Skylake Server and derived CPUs, e.g. Cascade Lake, is that those Intel CPUs have worse methods of limiting the power consumption than the contemporaneous AMD Zen. Whatever method was used by Intel, it reacted too slow during consumption peaks. Because of that, the Intel CPUs had to reduce the clock frequency in advance whenever they feared that a too large power consumption could happen in the future, e.g. when they see a sequence of AVX-512 instructions and they fear that more will follow.
While this does not matter for programs that do long computations with AVX-512, when the clock frequency really needs to go down, it handicaps the programs that execute only a few AVX-512 instructions, but enough to trigger the decrease in clock frequency, which slows down the non-AVX-512 instructions that follow.
This was a serious problem for all Intel CPUs derived from Skylake, where you must take care to not use AVX-512 instructions unless you intend to use many of them.
However it was not really a problem of AVX-512 but of Intel's methods for power and die temperature control. Those can be improved and Intel did improve them in later CPUs.
AVX-512 is not the only one that caused such undesirable behaviors. Even in much older Intel CPUs, the same kind of problems appear when you are interested to have maximum single-thread performance, but some random background process starts on another previously idle core. Even if that background process consumes a negligible power, the CPU is afraid that it might start to consume a lot and it reduces drastically the maximum turbo frequency compared with the case when a single core was active, causing the program that interests you to slow down.
This is exactly the same kind of problem, and it is visible especially on Windows, which has a huge quantity of enabled system services that may start to execute unexpectedly, even when you believe that the computer should be idle. Nevertheless, the people got used to this behavior, especially because it was little that they could do about it, so it was much less discussed than the AVX-512 slowdown.
AVX-512 is a better instruction set and many tasks can be done with fewer instructions than when using AVX or SSE, resulting in a lower energy consumption, even at the same data width.
The increase in size and power consumption is due almost entirely to the fact that AVX-512 has both twice the number of registers and double-width registers in comparison with AVX. Moreover the current implementations have a corresponding widening of the execution units and datapaths.
If SSE or AVX would have been widened for higher performance, they would have had the same increases in size and power, but they would have remained less efficient instruction sets.
Even in the worst AVX-512 implementation, in Skylake Server, doing any computation in AVX-512 mode reduces a lot the energy consumption.
The problem with AVX-512 in Skylake Server and derived CPUs, e.g. Cascade Lake, is that those Intel CPUs have worse methods of limiting the power consumption than the contemporaneous AMD Zen. Whatever method was used by Intel, it reacted too slow during consumption peaks. Because of that, the Intel CPUs had to reduce the clock frequency in advance whenever they feared that a too large power consumption could happen in the future, e.g. when they see a sequence of AVX-512 instructions and they fear that more will follow.
While this does not matter for programs that do long computations with AVX-512, when the clock frequency really needs to go down, it handicaps the programs that execute only a few AVX-512 instructions, but enough to trigger the decrease in clock frequency, which slows down the non-AVX-512 instructions that follow.
This was a serious problem for all Intel CPUs derived from Skylake, where you must take care to not use AVX-512 instructions unless you intend to use many of them.
However it was not really a problem of AVX-512 but of Intel's methods for power and die temperature control. Those can be improved and Intel did improve them in later CPUs.
AVX-512 is not the only one that caused such undesirable behaviors. Even in much older Intel CPUs, the same kind of problems appear when you are interested to have maximum single-thread performance, but some random background process starts on another previously idle core. Even if that background process consumes a negligible power, the CPU is afraid that it might start to consume a lot and it reduces drastically the maximum turbo frequency compared with the case when a single core was active, causing the program that interests you to slow down.
This is exactly the same kind of problem, and it is visible especially on Windows, which has a huge quantity of enabled system services that may start to execute unexpectedly, even when you believe that the computer should be idle. Nevertheless, the people got used to this behavior, especially because it was little that they could do about it, so it was much less discussed than the AVX-512 slowdown.