Disclaimer I barely read 10% of the thread but I think I can share some useful data:
First of all, the PS2 is a dual machine (can be NoC (network on chip)). There are 2 independents CPU that communicate with a serial interface (aka SIF) (as any ethernet cable). EE is around 300MHz, IOP is around 40-50MHz. Every time the EE wants to do IO (except graphics), it needs to communicate with the IOP. IOP is very slow, so you can communicate with it asynchronously. So globally the EE creates a thread that fires the IOP transfer and then sleeps waiting the response.
I see, 2/3 possibilities to check the status.
1/ Awake later and poll the SIF/dma register status
2/ wait a sif interruption
3/ set a timer and wait interuption.
Asynchronous stuff is nice, but often you need the data to continue (MultiThread gameplay is difficult), so I'm sure the EE spends lots of times waiting data from IOP (and potentially othrers DMA transfers). Besides, all this management costs you a lots of extra cycles which doesn't help. So if you overclock the EE CPU, you will spend more time waiting data and therefore, games will be slower (due to your PC limitation). It explains also why both "INTC Spin Detection" and "Wait Loop Detection" improve emulation speed. In the end it might better to overclock the IOP but it will probably break all timings...
On the past, hacking/playing with PCSX2, I found on SotC startup (before any menu) that
* set EE kernel timing close to 0 (which is equivalent to overclock the EE when running kernel code) => very slow external fps !
* set very big EE kernel timing (which is equivalent to underclock the EE when running kernel code) => fast external fps !
I don't what happen but potentially some threads do some heavy polling. We want to avoid to emulate that at all cost but I really don't know.
Overclocking EE will surely increase efficiency to execute some useful code but it will cost a lot on silly polling. It will be a big project but it would be useful to be able to profile internal game to measure properly this addition. I'm thinking about EE time by thread/how many threads switch/how many interruptions because I'm not sure there is an easy way to measure EE real usage or Internal fps.
Now let's get back to the implementation
What is the current speed of the EE? Potentially a +50/+40/+30/+20/+10 0 -33 -50 sliders could be useful for testing (in the end 50 is only a factor of 1.5 as -33).
For the backward compatibility question, I think with a bit of glue it would be possible. I don't think it would be complex (maybe an offset will be enough).
Note: for the extra tab with one option. Well don't worry, new options will likely in the future, in particular developer options