A feature request: Auto-configured software rendering threads
#11
Ok I made a PR.
This doesn't contain an automatic selection but just a better default value.
Additional this commit makes extra rendering threads more intuitive. Just as I described above the 1 ERT setting is removed. You start with two rendering threads if you use one extra rendering thread:

rendering threads = extra rendering threads + 1 ("main thread")
Reply

Sponsored links

#12
I disagree with changing how the ERTs work overall without a heavy discussion. It's always been that way and those of us that know the program have come to count on it.

Are you saying that 2 ERT then will be the same as now, and only the 1 setting is changed? or? If it's only the 1 ERT setting that it changes then it's fine. But if it affects other values then I'm not sure.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#13
We have feature freeze so feel free to discuss. We have a lot of time.
I thought making it intuitive is more important. Additionally you should never use 1 ERT. But I agree with you not to change existing features. It's just that in this case the setting seems to me completely wrong so I would make an exception here.

It affects all values above 0.

ERT = extra rendering threads
RT = rendering thread (async)
GS = gs main thread + potentially sync rendering thread (if ERT = 0)

new behavior:
0 ERT -> 2 threads total (1 GS + 1 Core)
1 ERT -> 3 threads total (1 GS + 1 Core + 1 ERT) - more correct: (2 RT + 1 Core)
2 ERT -> 4 threads total (1 GS + 1 Core + 2 ERT) - more correct: (3 RT + 1 Core)

old behavior:
0 ERT -> 2 threads total (1 GS + 1 Core)
1 ERT -> 2 threads total (1 Core + 1 ERT) - more correct: (1 RT + 1 Core)
2 ERT -> 3 threads total (1 Core + 2 ERT) - more correct: (2 RT + 1 Core)
So that 0ERT and 1ERT are more or less the same except for synchronization overhead)
Reply
#14
The new behavior definitely makes more sense. It may be worth it to shake things up, IDK. Like I said in the PR I'm gonna have to think about it and see what others say as well.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#15
(12-11-2015, 10:16 PM)gregory Wrote: Beside counting core is not crossplatform (the best solution will be to transfer it from core to plugin with a plugin extensions). Otherwise I think 2 is a general good default.

Note value 1 is special. Rendering is done on a separate thread but due to sync, it is as slow as 0.

There isn't 1 way to do it across platforms, but Windows, Mac, Linux, BSD all have ways to get some number from a terminal emulator/command prompt.
Windows:
Code:
echo %NUMBER_OF_PROCESSORS%

Mac/BSD/Linux: 
Code:
sysctl -n hw.ncpu

GNU/Linux:
Code:
nproc


You do the fancy #ifdef to obtain a number in a way that's dependent on the platform, and when GSdx requests how many extra rendering threads it can use, the core responds appropriately - ideally it would take into account whether or the MTVU speedhack is enabled. So the "Auto" setting would have to be event-driven as well, if the user toggles MTVU in the core, the number of ERTs in the GSdx plugin has to increase/decrease if there are enough processors to do so.
Reply
#16
That the definition of not portable. Honestly auto will be complicated to code, and will never give you a correct settings. Better spend time accelerate the ee recompiler. Just set the default for a 4core (i5). Sw renderer is likely too slow for 2 core anyway.
Reply
#17
On the 0/1 cases. It is only useful for dev to measure threading overhead. For user there are the same off. Is value 0/1 used by user?
Reply
#18
possibly they use one believing that it is the best setting.
QuadCore: pcsx2 + MTVU + gs software renderer = 3 threads. Adding one additional thread seems reasonable.

If they would benchmark they would see that the optimum is 2/3 ERT's.

Also 0/1 to benchmark sync overhead seems to naive. 0/1 is not necessarily syncing the same way as 2 or more. Optimizing 1 vs 0 ERT performance should therefore not necessarily optimize 2,3,4 ERT's the same way.

I don't think there is a relevant point for 1 ERT even for development. Maybe except for the time when multithreading was implemented.
Reply
#19
also the assumption that software renderer is too slow for dual core isn't really a reasonable one since dual cores (like haswell ) with higher IPC / Single thread performance could also handle software mode fine on quite a few games. even my older conroe architecture managed to handle few games fine at 40+ fps on software mode.

The increment to the threads at the function looks good to me , it''s probably the better behavior with relevant to the function as people always except 1 to be a speedup compared to 0.
Reply
#20
I dont think the speedloss of 2 ERT (=4 threads) on a strong ipc dualcore will be large. If EE & GS << 100% the remaining cpu time of 'their cores' can be used by the rendering threads. For haswell dualcores 2 RT mitght even increase the performance (as MTVU can increase performance as well on dual cores)
Reply




Users browsing this thread: 1 Guest(s)