GSdx 1.0
(08-15-2015, 04:23 PM)gregory Wrote: Use perf with mesa debug symbol installed. Potentially I found another root cause of the slowness: texture upload. I only look at palette update 500k update for only 170 frames. I didn't check main texture.

Today Linux won't listen to me...I cannot get nothing to work...BUT
I installed apitrace (quite good program)  on windows 7 and did a bit of research...
If you take a look at the attached file, basically, you see that you call glDrawElementsBaseVertex twice each time, once (the first bug in the screenshot) with the z-depth writing enabled (glDepthMask(GL_TRUE)) and the shader program set to 70 (glUseProgramStages(1, GL_FRAGMENT_SHADER_BIT, 70)), and once disabled (glDepthMask(GL_FALSE)) with the shader program set to 71 (glUseProgramStages(1, GL_FRAGMENT_SHADER_BIT, 71)), no other changes between the two draws.
The apitrace gui states that this trick costs fragment shader recompilation.
I don't know if that's intentional, but to me seems strange, being a bit opengl agnostic...
I'm now doing further research looking at frames performance profiling.
I can give you the apitrace dump if interested.

EDIT: Apitrace performance profiling has a little bug that make it impossible for me to run the application, but I reported the bug and I wish it will be fixed soon.


Attached Files Thumbnail(s)
   
Reply

Sponsored links

I know quite well apitrace. I even submitted a couple of bug fixes/improvement Tongue2 I have enough GB of apitrace of my ssd.
Quote:If you take a look at the attached file, basically, you see that you call glDrawElementsBaseVertex twice each time, once (the first bug in the screenshot) with the z-depth writing enabled (glDepthMask(GL_TRUE)) and the shader program set to 70 (glUseProgramStages(1, GL_FRAGMENT_SHADER_BIT, 70)), and once disabled (glDepthMask(GL_FALSE)) with the shader program set to 71 (glUseProgramStages(1, GL_FRAGMENT_SHADER_BIT, 71)), no other changes between the two draws.
This trick is used to emulate alpha test. Hum actually maybe some cases can be done faster. I don't think it is the issue. The fragment recompilation appears recently but I don't know the real issue. Perf impact likely remains slow. It is the GL state between first compilation and usage.
Reply
(08-16-2015, 11:11 AM)gregory Wrote: I know quite well apitrace. I even submitted a couple of bug fixes/improvement  Tongue2 I have enough GB of apitrace of my ssd.
This trick is used to emulate alpha test. Hum actually maybe some cases can be done faster. I don't think it is the issue. The fragment recompilation appears recently but I don't know the real issue. Perf impact likely remains slow. It is the GL state between first compilation and usage.

LoL you're PRO! Take a look at https://github.com/apitrace/apitrace/issues/358#issuecomment-131495793 , it's the issue bugging my profiling.
But somehow it works and then, I managed to profile the trace.

Attached is an interesting view of the profiling, that maybe you already have seen.

The time between two spikes is constant, and assuming constant load from your plugin, I think that the number of draw calls to be processed by the driver is the only issue.

Profiling another part of the game, with 2 robots displayed, but without the enemy boss of the slow part, the number of api calls is reduced by a factor 3 (30k vs 90k), and I can get 50 fps stable, limited by framelimiter.

That part though has very little number of alpha textures.

Trasparent textures in this game are also affected by major ghosting, that in my opinion translates to more and more useless draw calls.


Can you test disabling that alpha thing at all? I know it will look garbage, but I ultimately think that alpha / ghosting is the problem here, and hacking it will speedup the game by a huge factor, also disabling the ghosting -> screen attached.

EDIT: The screenshot looks like garbage, here's a dropbox link to the same image: https://www.dropbox.com/s/9lojm2kguu1gfi...g.png?dl=0


Attached Files Thumbnail(s)
       
Reply
(08-16-2015, 11:11 AM)gregory Wrote: I know quite well apitrace. I even submitted a couple of bug fixes/improvement  Tongue2 I have enough GB of apitrace of my ssd.
This trick is used to emulate alpha test. Hum actually maybe some cases can be done faster. I don't think it is the issue. The fragment recompilation appears recently but I don't know the real issue. Perf impact likely remains slow. It is the GL state between first compilation and usage.

Sorry for double posting, but I compiled your plugin with a simple if (!IsOpaque()) return; when drawing primitives....and the api calls reduced to 30k from 90k, achieving full speed 50 fps! Though the game looks quite garbage...

https://www.dropbox.com/s/ooa033p6zilyxx...X.dll?dl=0

In another scene, the one without the enemy boss, it went from 34k api calls to 17k...
Reply
For sure, you just remove all blending operation which is not the same as alpha test.

Anyway, it would be hard to reduce the number of draw call. Maybe a couple of alpha test could be implemented in a single pass but it is unlikely to be the slowdown cause. I did a quick test fps increase from 47 to 50

I think the real issue is the upload of the palette texture. Palette are only 1024B or 64B but the upload take a strange long time. It I remove the gl call that upload the palette frp increase from 47 to 71! I suspect a synchronization issue.

Quote:* GPU draw with palette 1
* CPU want to upload a new palette 1 but it need to wait the end of previous draw calls.
* GPU draw with new palette 1.
Reply
(08-16-2015, 06:33 PM)gregory Wrote: I think the real issue is the upload of the palette texture. Palette are only 1024B or 64B but the upload take a strange long time. It I remove the gl call that upload the palette frp increase from 47 to 71! I suspect a synchronization issue.

Quote:* GPU draw with palette 1
* CPU want to upload a new palette 1 but it need to wait the end of previous draw calls.
* GPU draw with new palette 1.

does gl have that multithreaded device stuff. you could shove the palette upload to another thread so it doesn't wait in the primary for the draw call to finish. ofc you gotta have an unused resource. i mean... this is what i think is the concept of that multithreaded optimization. or not?
Reply
(06-27-2015, 01:29 AM)tsunami2311 Wrote: That plugin fix shadows for both opengl/dx11 hw. in dq8
Bonus fix!

I tested this back when it was first introduced so i not even sure what build that was or if it was actual plugin  that was tested and i am honsetly not gona digg threw months of build, not with with current random number scheme,  even with the dates attached to it. to fiqure out which build that was
 
The shadows DID work in DX11, but they apparently no longer work again with out skiphack.


first reported in this thread
http://forums.pcsx2.net/Thread-Dragon-Qu...#pid475400

I went back and tested this in DX11 and sure enough shadows are no longer working with out skiphack seeing as this mostly opengl thread,

@gregory do you want me to post gsdx dump? other wise i wont bother



Update for Xenosaga III

Opengl 3x
Blend=basic
HW OGL Depth

Cutscenes seem to be full speed, Line threw the middle is still there (software glitch) fixable by round sprite= check

The ingame being to dark is fixed too, IT is now correct darkness in both hw/software

DX11 still needs Full crc for fullspeed in cut scense and DX11 is still to dark compared to software. I really dont expect the DX11 issue to be fix seeing this topic mostly for opengl. i will make dumps if gregory wants them for dx 11

VP2 Update
IT still slow in opengl and dx11 with no crc, full is needed so its slow down is not related to the slow down that was cause cut scene in xenosaga III cut scene, at lest as far as the opengl side of things

All as of revision (260c127)
Reply
(08-16-2015, 06:57 PM)dabore Wrote: does gl have that multithreaded device stuff. you could shove the palette upload to another thread so it doesn't wait in the primary for the draw call to finish. ofc you gotta have an unused resource. i mean... this is what i think is the concept of that multithreaded optimization. or not?

Everything is MT but the upload of texture 1 must be done before the rendering that uses texture 1. And can't be overwritten before the end of the rendering. MT is still limited by time physics. Future is still after the past Wink

Quote: @gregory do you want me to post gsdx dump? other wise i wont bother
No. I don't care about DX, it is a dead API Tongue2 I'm working already at 1000% to upgrade openGL.
Reply
(08-16-2015, 11:04 PM)gregory Wrote: Everything is MT but the upload of texture 1 must be done before the rendering that uses texture 1. And can't be overwritten before the end of the rendering. MT is still limited by time physics. Future is still after the past Wink

No. I don't care about DX, it is a dead API Tongue2 I'm working already at 1000% to upgrade openGL.

how long till you throw DX out of GSDX ?? OpenGL support is amazing now maybe if you make opengl support 2x better which it already is over DX imo over it current opengl support it might draw gabest challenge him fix dx11 up or even add dx 12 haha
Reply
if we throw out DX, how can we call it GSDX? Tongue
[Image: ref-sig-anim.gif]

Reply




Users browsing this thread: 7 Guest(s)