Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Some interesting test results, on an I7-4770k @ 3.5ghz
#11
tsx. o_O don't throw me in that far. i speculated with synchronizing data. i didn't want to post it.

the rasterizer implementation is above my head, but there sholdn't be too much conflicts. a sober per pixel thread to avoid write collisions to cache. one could extend it to process on a cache line granularity per thread to avoid cacheline writes that'd need to be synchronized between cores - to not do speculative. there is actually no issue tho. the reads of texture are probably the tricky thing with overlapping cache reads. if the texture is crooked the thing get's slow cause it fetches loads of cache lines of random memory offsets. that's garbaging the cache to unpredictability. and all 8 threads wanna do so. somewhen the garbaging load is too much for the cache. result: slowerness.

you gotta think super slowmo cpu to optimize that. i still have no idea. Laugh
Reply

Sponsored links

#12
(05-31-2014, 03:10 PM)refraction Wrote: yes extra threads means "more over the base thread" so yes, that is 5 threads.

so that means I need to correct a few posts and add 1 to them all.. lol

This still however makes little sense when compairinbg my CPU to it, as 4 threads (+3 extra threads) runs best for me. (which note, uses BOTH physical AND HT cores) and the case of the i7 uses ALL physical cores and an extra HT core

i3
Quote:[Settings]
ShadeBoost_Contrast=50
ShadeBoost_Brightness=50
ShadeBoost_Saturation=50
Adapter=default
ModeWidth=0
ModeHeight=0
ModeRefreshRate=0
Renderer=4
Interlace=7
AspectRatio=0
upscale_multiplier=1
MaxAnisotropy=0
windowed=1
filter=2
paltex=0
logz=1
fba=1
aa1=0
nativeres=1
resx=1024
resy=1024
extrathreads=3
AnisotropicFiltering=0
ShadeBoost=1
Fxaa=0
shaderfx=0
UserHacks=0[Settings]
ShadeBoost_Contrast=50
ShadeBoost_Brightness=50
ShadeBoost_Saturation=50
Adapter=default
ModeWidth=0
ModeHeight=0
ModeRefreshRate=0
Renderer=4
Interlace=7
AspectRatio=0
upscale_multiplier=1
MaxAnisotropy=0
windowed=1
filter=2
paltex=0
logz=1
fba=1
aa1=0
nativeres=1
resx=1024
resy=1024
extrathreads=3
AnisotropicFiltering=0
ShadeBoost=1
Fxaa=0
shaderfx=0
UserHacks=0
i7
Quote:[Settings]
ShadeBoost_Contrast=50
ShadeBoost_Brightness=50
ShadeBoost_Saturation=50
Adapter=default
ModeWidth=0
ModeHeight=0
ModeRefreshRate=0
Renderer=4
Interlace=7
AspectRatio=0
upscale_multiplier=1
MaxAnisotropy=0
windowed=1
filter=2
paltex=0
logz=1
fba=1
aa1=0
nativeres=1
resx=1024
resy=1024
extrathreads=4
AnisotropicFiltering=0
ShadeBoost=1
Fxaa=0
shaderfx=0
UserHacks=0
Reply
#13
Well does Pcsx2's parallel processing scale perfectly?
Like in theory if you had a 500-core CPU would it take advantage of them all just as well or does the benefits tail off after 5 threads or so?
<REDACTED>
Reply
#14
on a light. did you do that test with mtvu?

i might wanna assume it's sensitive at some sort of "context switching". maxing it to this on 4 free and equally "lonely context" threads and the gsdx main which does workload but is system bound thru dx.

just doing math and force moving the system context into the eemain thread - without mtvu - and gsdx as the second thread one could max the extra threads to 6 and still increase.

but that's just a theory. Smile
Reply
#15
(05-31-2014, 07:26 PM)dabore Wrote: on a light. did you do that test with mtvu?
yes, tests were done with MTVU.
Reply
#16
(05-31-2014, 07:22 PM)Fezzer Wrote: Well does Pcsx2's parallel processing scale perfectly?

I think this is just the point. It scales more or less perfectly up to extra:3. Some will need to use 2 other will use 4. Depending on the cpu design.

I think one should not underestimate HT for cpu-intense stuff. This is shown in Saiki's tests. I think pcsx2 is not limited by HT but only by parallel processing delays. That's the reason for the magical +3-threads-barrier.

Unfortunately one would need something like a real octacore to prove that.
Reply
#17
It all depends how busy these threads are. Thats how HT works well, is by having millions of threads doing small amounts of work. If you don't have HT it can only run 4 threads then you have the threading code delays in between, which seriously slow stuff down, but if you can issue 8 threads at once, you reduce delays, which is probably why it helps on some games.
[Image: ref_sig_anim.gif]
Like our Facebook Page and visit our Facebook Group!
Reply
#18
I have done a similar test before with real cores, and my results were(on an FX6300 with 6 real cores) that speed increased up to extra threads = 3(so total 4), 4(5) was the same as 3 and 5(6) was slower than 3. So on my CPU, 3 extra threads is fastest. This was WITHOUT MTVU though. So it kinda makes sense. At extra threads = 3, each core has a thread. EE, GS, and 4 software rendering threads.
[Image: vwah44]
Gaming: Intel i7 3770k @ 4.2Ghz | R9 290 | 16GB RAM | 480GB(240GB+240GB RAID0) SSD | 3 TB HDD | 1 TB HDD | 500GB HDD
Server: AMD FX 6300 @ 4.4Ghz | GTX 670 | 16GB RAM | 240GB SSD | 320GB HDD
PCSX2 General Troubleshooting FAQ
Reply
#19
isn't it 5? EE, GS + 3 extra threads?
Reply
#20
I believe it's EE + GS + Software thread + Extra threads.

The GS thread(like the one in the window) is not rendering, but it's rather emulating the GS itself. I think.
[Image: vwah44]
Gaming: Intel i7 3770k @ 4.2Ghz | R9 290 | 16GB RAM | 480GB(240GB+240GB RAID0) SSD | 3 TB HDD | 1 TB HDD | 500GB HDD
Server: AMD FX 6300 @ 4.4Ghz | GTX 670 | 16GB RAM | 240GB SSD | 320GB HDD
PCSX2 General Troubleshooting FAQ
Reply




Users browsing this thread: 1 Guest(s)