A feature request: Auto-configured software rendering threads
#21
(12-13-2015, 07:14 PM)willkuer Wrote: I dont think the speedloss of 2 ERT (=4 threads) on a strong ipc dualcore will be large. If EE & GS << 100% the remaining cpu time of 'their cores' can be used by the rendering threads.

sorry, really need some benchmarks for credibility Tongue

Quote:or haswell dualcores 2 RT mitght even increase the performance (as MTVU can increase performance as well on dual cores)

not really, MTVU doesn't always provide a increase in speed on dual cores. usually speed will remain similar to MTVU disabled. Speed up was usually rare for me on my old conroe where MTVU only seemed to be helpful on B10: POE , Bakugan: Battle brawlers and I think some wrestling games ? not really sure...

at worst case scenario you'll usually experience decrease in speed on few games, you can try checking GOW 1 & 2 . GTA VC and BM: Kart. ( which IIRC had quite a bit of drop with MTVU enabled whereas MTVU gave a good boost on my ivy bridge quad core at few of these games.
Reply

Sponsored links

#22
(12-13-2015, 03:17 PM)gregory Wrote: On the 0/1 cases. It is only useful for dev to measure threading overhead. For user there are the same off. Is value 0/1 used by user?

Many users use 0 because it's the default value. I'm not sure about 1, but I guess many users will try it. How about using a select box containing values 0-8 and adding a tooltip text similar to the following?
Quote:1 is for testing and debugging.
2 is recommended for CPU-intensive games.
3 is recommended for other games.
Values above 3 are recommended for quad-core CPUs with HT or 6/8-core CPUs.

As for the default value, you can change it to 2 or 3.
Reply
#23
Except for the 1=debug 0=sync tooltips you can not put others.

The correct usage of 2,3,... Highly depends on the cpu and partially on the game and settings. I dont think that anybody of us could provide a general rule for all available scenarios. As we can not provide a rule we can not code it and we can not put proper tooltips.

The idea is just that 2 ERT is more often correct than 0 ERT and therefore would be a better default value.

The more i think about all of this the more I believe that there is some kind of plateau. I would guess that as long as ERT and number of cores dont differ too much the performance should be more or less stable. Didnt we have once some benchmarks including performance of sse/avx/avx2?
@Blyss I think it was your thread. Or the one of nobbs?
Reply
#24
(12-14-2015, 01:24 AM)willkuer Wrote: @Blyss I think it was your thread. Or the one of nobbs?

http://forums.pcsx2.net/Thread-Comparing...1-AVX-AVX2

Nobbs tested on quad core and I tested on hex core. We only tested 0 and 2 threads there though:

First, with 0 extra threads:

Blyss Sarania:
SSE2: 20 FPS
SSSE3: 21 FPS
SSE4.1: 21.9 FPS
AVX: 23.2 FPS

We can see that the better instructions set do have a decent FPS gain. AVX is 16% faster than SSE2 for me.

Nobbs66:
SSE2: 22 FPS
SSSE3: 22.5 FPS
SSE4.1: 22 FPS
AVX: 23 FPS
AVX2: 24.5 FPS

Here we can see that the instruction set scaling is a bit different on the Intel CPU, but generally using more advanced instructions gets you more FPS.

Now, with 2 extra threads:

Blyss Sarania:
SSE2: 31.2 FPS
SSSE3: 31.3 FPS
SSE4.1: 31.8 FPS
AVX: 31.8 FPS

Interesting, right? There is almost no difference between the instruction sets when you are using 2 extra rendering threads. My guess is that when using extra threads, cross thread communication becomes the bottleneck and any gains from using the better instructions set get completely erased with my AMD chip. But for Nobbs66's Intel chip, the story is a bit different:

Nobbs66:
SSE2: 39 FPS
SSSE3: 39 FPS
SSE4.1: 39 FPS
AVX: 41 FPS
AVX2: 46 FPS
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#25
Memory controller limitation. More thread are nice if you can feed them with data. (+ the fact that you have 1/2 avx unit by core, so 1 complete if few core are used).
Reply
#26
(12-13-2015, 07:14 PM)willkuer Wrote: I dont think the speedloss of 2 ERT (=4 threads) on a strong ipc dualcore will be large. If EE & GS << 100% the remaining cpu time of 'their cores' can be used by the rendering threads. For haswell dualcores 2 RT mitght even increase the performance (as MTVU can increase performance as well on dual cores)

Yeah. Most of my games(probably at least 80% out of the 30 games I own) run fastest, and occasionally equivalent to other settings, on my g3258(dual core haswell) with 2 extra rendering threads.

I don't think an auto detect setting is a good idea, but I also don't think 0 is a good default. It should at least be 1..
Reply
#27
(12-14-2015, 07:44 PM)dogen Wrote: It should at least be 1..

1 ERT is pretty much the same as 0 ERT on dual-core systems. 2 ERT should work better than 0/1 ERT on powerful dual-core systems because 2 ERT results in 2 rendering threads.

(12-14-2015, 01:24 AM)willkuer Wrote: Except for the 1=debug 0=sync tooltips you can not put others.

How about this?
Quote:0/1 is for testing and debugging.
2 is recommended for powerful dual-core CPUs and quad-core CPUs.
3 is recommended for powerful quad-core CPUs.
Values above 3 are recommended for quad-core CPUs with HT or 6/8-core CPUs.
Reply
#28
(12-14-2015, 08:42 PM)xemnas99 Wrote: 1 ERT is pretty much the same as 0 ERT on dual-core systems. 2 ERT should work better than 0/1 ERT on powerful dual-core systems because 2 ERT results in 2 rendering threads.


How about this?

Doesn't 2 ERT mean 3 rendering threads? The main GS thread has to be doing some of the work, otherwise 0 wouldn't draw anything..
Reply
#29
(12-14-2015, 09:13 PM)dogen Wrote: Doesn't 2 ERT mean 3 rendering threads? The main GS thread has to be doing some of the work, otherwise 0 wouldn't draw anything..

With 0 ERT, the GS thread does the rendering. If the number of ERT is more than 0, the GS thread doesn't do the rendering.
Reply
#30
(12-14-2015, 09:18 PM)xemnas99 Wrote: With 0 ERT, the GS thread does the rendering. With 1 ERT, the GS thread doesn't do the rendering.

Oh ok. Didn't know that.
Reply




Users browsing this thread: 1 Guest(s)