Posts: 21.718
Threads: 401
Joined: Sep 2013
Reputation:
476
Location: 私の夢の中
12-12-2015, 04:50 AM
(This post was last modified: 12-12-2015, 04:51 AM by Blyss Sarania.)
I disagree with changing how the ERTs work overall without a heavy discussion. It's always been that way and those of us that know the program have come to count on it.
Are you saying that 2 ERT then will be the same as now, and only the 1 setting is changed? or? If it's only the 1 ERT setting that it changes then it's fine. But if it affects other values then I'm not sure.
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Posts: 5.076
Threads: 18
Joined: Oct 2010
Reputation:
154
12-12-2015, 04:53 AM
(This post was last modified: 12-12-2015, 04:59 AM by willkuer.)
We have feature freeze so feel free to discuss. We have a lot of time.
I thought making it intuitive is more important. Additionally you should never use 1 ERT. But I agree with you not to change existing features. It's just that in this case the setting seems to me completely wrong so I would make an exception here.
It affects all values above 0.
ERT = extra rendering threads
RT = rendering thread (async)
GS = gs main thread + potentially sync rendering thread (if ERT = 0)
new behavior:
0 ERT -> 2 threads total (1 GS + 1 Core)
1 ERT -> 3 threads total (1 GS + 1 Core + 1 ERT) - more correct: (2 RT + 1 Core)
2 ERT -> 4 threads total (1 GS + 1 Core + 2 ERT) - more correct: (3 RT + 1 Core)
old behavior:
0 ERT -> 2 threads total (1 GS + 1 Core)
1 ERT -> 2 threads total (1 Core + 1 ERT) - more correct: (1 RT + 1 Core)
2 ERT -> 3 threads total (1 Core + 2 ERT) - more correct: (2 RT + 1 Core)
So that 0ERT and 1ERT are more or less the same except for synchronization overhead)
Posts: 21.718
Threads: 401
Joined: Sep 2013
Reputation:
476
Location: 私の夢の中
The new behavior definitely makes more sense. It may be worth it to shake things up, IDK. Like I said in the PR I'm gonna have to think about it and see what others say as well.
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Posts: 6.069
Threads: 68
Joined: May 2010
Reputation:
167
Location: Grenoble, France
That the definition of not portable. Honestly auto will be complicated to code, and will never give you a correct settings. Better spend time accelerate the ee recompiler. Just set the default for a 4core (i5). Sw renderer is likely too slow for 2 core anyway.
Posts: 6.069
Threads: 68
Joined: May 2010
Reputation:
167
Location: Grenoble, France
On the 0/1 cases. It is only useful for dev to measure threading overhead. For user there are the same off. Is value 0/1 used by user?
Posts: 5.076
Threads: 18
Joined: Oct 2010
Reputation:
154
12-13-2015, 06:15 PM
(This post was last modified: 12-13-2015, 06:20 PM by willkuer.)
possibly they use one believing that it is the best setting.
QuadCore: pcsx2 + MTVU + gs software renderer = 3 threads. Adding one additional thread seems reasonable.
If they would benchmark they would see that the optimum is 2/3 ERT's.
Also 0/1 to benchmark sync overhead seems to naive. 0/1 is not necessarily syncing the same way as 2 or more. Optimizing 1 vs 0 ERT performance should therefore not necessarily optimize 2,3,4 ERT's the same way.
I don't think there is a relevant point for 1 ERT even for development. Maybe except for the time when multithreading was implemented.
Posts: 8.597
Threads: 105
Joined: May 2014
Reputation:
168
Location: 127.0.0.1
also the assumption that software renderer is too slow for dual core isn't really a reasonable one since dual cores (like haswell ) with higher IPC / Single thread performance could also handle software mode fine on quite a few games. even my older conroe architecture managed to handle few games fine at 40+ fps on software mode.
The increment to the threads at the function looks good to me , it''s probably the better behavior with relevant to the function as people always except 1 to be a speedup compared to 0.
Posts: 5.076
Threads: 18
Joined: Oct 2010
Reputation:
154
I dont think the speedloss of 2 ERT (=4 threads) on a strong ipc dualcore will be large. If EE & GS << 100% the remaining cpu time of 'their cores' can be used by the rendering threads. For haswell dualcores 2 RT mitght even increase the performance (as MTVU can increase performance as well on dual cores)