(04-27-2009, 01:16 AM)PrinceGaz Wrote: When you think about it, PCSX2 is emulating several chips at the same time (presumably splitting time between them otherwise it would be capable of dividing itself between more than two threads). But those chips have access to 32MB of very fast memory which is too much to fit in even the largest L3 cache available today.
Precisely. Because it's an emulator especially, pcsx2 has to do a lot of doube-moves and memory mirroring across emulated systems. Most games which "sag" in the performance area move well over 100 megs of data and execute nearly two megabytes of code every
frame. The cache ends up being useful for short term operations only, most of which fit inside a 1 or 2MB cache regardless, and on the long term any cache of any size is just clobbered. The only time the larger cache could possibly be useful is in a situation as Gabest describes, where you have 4+ threads all working on different data sets and all competing for the same shared cache space. [aka, the software gsdx]
But even then there are tricks you can do to reduce the cache requirements of an app. I worked on a project that used a multithreaded 2D software rasterizer (at 1680x1050) that we optimized for cpus with 1MB cache by breaking the viewport into many smaller regions to ensure the region size being drawn by each thread fit inside the cache (a total of 14 regions, rasterized in parallel, 2 at a time). This yielded a ~30% speedup over a more conventional approach. The drawback was that it didn't automatically scale to the L2 cache size of the CPU it was running on, but experimentation showed that it didn't really matter. The benefit of the larger L2 caches wasn't linear, and optimizing the region sizes for a 4MB C2D only yielded about 3% improvement over the 1MB scale on the same CPU.
(04-27-2009, 01:23 AM)PrinceGaz Wrote: One question, the CPU usage in GSdx-- that is the CPU usage of a SINGLE thread which GSdx creates? A thread seperate from the two threads which the core code of PCSX2 creates? I've seen that reach nearly 90% CPU whilst playing Grandia III (whilst playing FMV sequences), though it still tends to hover around 30-40% on my box in normal play. Does that mean PCSX2 would benefit from a tri-core CPU (two cores for the main emulator, plus another to handle the graphics plug-in?).
In hardware mode it is measuring the cpu utilization of the thread Pcsx2 created for it. IF it reads 40%, it means the GS plugin is spending 60% of it's time waiting for the EE emulation core to feed it data. If it reads over 90%, then chances are the EE is stalling a lot waiting for the GS to catch up.
These rules of thumb apply on to no-limit mode. If you have the frame limiter on, and are running 60+ fps, then both cores will be idling quite a bit.