Posts: 3.526
Threads: 6
Joined: Dec 2013
05-31-2014, 06:36 PM
(This post was last modified: 05-31-2014, 06:38 PM by dabore.)
tsx. o_O don't throw me in that far. i speculated with synchronizing data. i didn't want to post it.
the rasterizer implementation is above my head, but there sholdn't be too much conflicts. a sober per pixel thread to avoid write collisions to cache. one could extend it to process on a cache line granularity per thread to avoid cacheline writes that'd need to be synchronized between cores - to not do speculative. there is actually no issue tho. the reads of texture are probably the tricky thing with overlapping cache reads. if the texture is crooked the thing get's slow cause it fetches loads of cache lines of random memory offsets. that's garbaging the cache to unpredictability. and all 8 threads wanna do so. somewhen the garbaging load is too much for the cache. result: slowerness.
you gotta think super slowmo cpu to optimize that. i still have no idea.
Posts: 4.591
Threads: 65
Joined: Sep 2010
Reputation:
145
Location: Isle of Man
05-31-2014, 07:22 PM
(This post was last modified: 05-31-2014, 07:24 PM by Fezzer.)
Well does Pcsx2's parallel processing scale perfectly?
Like in theory if you had a 500-core CPU would it take advantage of them all just as well or does the benefits tail off after 5 threads or so?
<REDACTED>
Posts: 3.526
Threads: 6
Joined: Dec 2013
05-31-2014, 07:26 PM
(This post was last modified: 05-31-2014, 07:28 PM by dabore.)
on a light. did you do that test with mtvu?
i might wanna assume it's sensitive at some sort of "context switching". maxing it to this on 4 free and equally "lonely context" threads and the gsdx main which does workload but is system bound thru dx.
just doing math and force moving the system context into the eemain thread - without mtvu - and gsdx as the second thread one could max the extra threads to 6 and still increase.
but that's just a theory.
Posts: 20.325
Threads: 405
Joined: Aug 2005
Reputation:
554
Location: England
It all depends how busy these threads are. Thats how HT works well, is by having millions of threads doing small amounts of work. If you don't have HT it can only run 4 threads then you have the threading code delays in between, which seriously slow stuff down, but if you can issue 8 threads at once, you reduce delays, which is probably why it helps on some games.
Posts: 21.718
Threads: 401
Joined: Sep 2013
Reputation:
476
Location: 私の夢の中
06-01-2014, 05:21 AM
(This post was last modified: 06-01-2014, 05:28 AM by Blyss Sarania.)
I have done a similar test before with real cores, and my results were(on an FX6300 with 6 real cores) that speed increased up to extra threads = 3(so total 4), 4(5) was the same as 3 and 5(6) was slower than 3. So on my CPU, 3 extra threads is fastest. This was WITHOUT MTVU though. So it kinda makes sense. At extra threads = 3, each core has a thread. EE, GS, and 4 software rendering threads.
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Posts: 5.076
Threads: 18
Joined: Oct 2010
Reputation:
154
isn't it 5? EE, GS + 3 extra threads?
Posts: 21.718
Threads: 401
Joined: Sep 2013
Reputation:
476
Location: 私の夢の中
I believe it's EE + GS + Software thread + Extra threads.
The GS thread(like the one in the window) is not rendering, but it's rather emulating the GS itself. I think.
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD