[deleted]
#11
(07-17-2015, 06:32 PM)xemnas99 Wrote: I used to explain this. Anyway, Direct3D11 can send multiple commands at once as well. You can read this: Command List.

There was something similar on openGL (call display list). They remove it because it was too complex. Now nvidia adds back a new extension to send command by list.

For sure driver convert the state into a byte stream and send the byte stream. Now it will be done by the application. In normal situation your GPU is working, so the command is stored in a temporary buffer and it won't process it soon. So it isn't important if you split the command into N transfers (PCIe will split it anyway Wink )
Reply

Sponsored links

#12
[deleted]
Reply
#13
(07-17-2015, 09:01 PM)Alexander Wrote: Thanks for the replies everyone.

I guess the general takeaway/gross simplification of it is "It might speed it up slightly, or sometimes moderately", and that's about it.

Ah well, guess no silver bullet. Still need new hardware. But I wasn't dumb enough to think that it was gonna be one anyway.

It is a good summary Smile

I will add nothing prevent us to use DX10/11 class feature to improve speed & accuracy.
Reply
#14
display lists? wasn't that for the oldschool primitive assembly from all the gl_things? baking the resource package? a valid optimization. bypassing a lil bit of pcie latency when sending bigger packages of geometry. the content. and command list are more of rendering setup commands. i'm only guessing logically tho it basicly records howto connect resources and what shaders are used to compute the output. i'd plug that logic on a sorta material or mesh resource level or how much whenever to buffer and render timely efficiently. imo it depends how you organize the resources needed to compute the output. they all gotta be there. and the resource packing itself and uploading is still something else... i guess?

thinking in program flow: i guess for pcsx2's geometry generator you have to bake both at the same... right? assemble resources and record the state machine. break the list when a state occurs that's not in the shader combo? or can you switch programs inside of the list? then there's barely limits. sure gotta take care of the resources. maybe multithread texture uploads or generally new resources before you can send the compiled command list to execute. a very content dependant chain. the resources gotta be done transfering when the command list is finished compiling or executed. this' the possible stall point i presume. i dunno where the resource lock has to and the stall would happen. cause...

i wonder if one could bake and send the command list with created but potentially incomplete resources. if i'd have a texture created but still in the threaded upload queue. the resource might possibly not have been locked to stall the list compilation there (for speed). then i would render the list like that. incomplete. it's worth a fun test. but... i'm not a ogl guy. Laugh

i'm sure you (@gregory) know that specific pcsx2 ogl code better then me anyway. Smile
Reply
#15
(07-17-2015, 06:32 PM)xemnas99 Wrote: I used to explain this. Anyway, Direct3D11 can send multiple commands at once as well. You can read this: Command List.

Yeah but, DX12 does the job better by using Execute indirect. you should take a look at this: http://www.dsogaming.com/news/directx-12...cpu-usage/
We're supposed to be working as a team, if we aren't helping and suggesting things to each other, we aren't working as a team.
- Refraction
Reply
#16
(07-18-2015, 06:28 AM)ssakash Wrote: Yeah but, DX12 does the job better by using Execute indirect. you should take a look at this: http://www.dsogaming.com/news/directx-12...cpu-usage/

You know, openGL (4.3) has this feature since 3 years already (or at least 90% of it since I don't know the details). The feature is called GL_ARB_multi_draw_indirect. I don't think it can be useful for emulation anyway.
Reply
#17
(07-18-2015, 02:57 PM)gregory Wrote: You know, openGL (4.3) has this feature since 3 years already (or at least 90% of it since I don't know the details). The feature is called GL_ARB_multi_draw_indirect. I don't think it can be useful for emulation anyway.

you don't think? i don't think that either tho. the thought is having constantly changing resources in emulators. that draw multi looks like a record of a compiled effect with fixed resources. like a fixed mesh that render the same all the time. you can effectively store that whole rendering chain of commands and stuff and the resources on the gpu. then run that computation per single command. you can also use it for particle systems. just simulate a step with a single call. every resource is on the gpu. and the shaders gotta manage to not flow over resources if it's a dynamic particle system.
Reply
#18
I think (from what little i know) is we don't really know what the game is going to want to do in advance, so we are a bit of a slave to the game, plus the drawn stuff gets reused it the proceeding draws so they need to be done in advance, whereas a PC game you know what you're trying to do from the get-go, so multiple calls are easy to throw in one go.
[Image: ref-sig-anim.gif]

Reply
#19
Yes the issus is the game engine. For example let's take God of War. To compute the field effect, they do a 3 shader-passes. However to reduce gs dram access they don't do it once for the fullscreen but once for each 64 pixel width column. Therefore the effect uses 24 draw calls. Worse they change the render target 2 times for the effects (it means a flush of the gpu, they forget to tell you that you can send lots of draw call but without changing the target). Even worse one of the input texture is the the previous depth buffer so you need to convert it to a color format before the draw. (1*8 extra draw calls).
Normal games will do the fullscreen directly (and likely in 1 or 2 step). Just patch the game do use a real fullscreen rendering and you gain 10/20 fps easily. Sadly lots of games split the rendering in small parts.
Metal Gears Solid split the rendering in small block of 64×32 pixels.
Reply
#20
Actually the issue isn't to emulate the basic game rendering, often this part is quite fast already. The issue is post processing effect. GS isn't programmable as a shader so they emulate lots of things with hack. For example you can't swizzle vector so they do some special memory conversion to support it. A swizzle on the gpu is free (often 0 cycle) whereas special memory conversion is awfully ultra costly (20-40 fps!).
Reply




Users browsing this thread: 1 Guest(s)