Posts: 3.526
Threads: 6
Joined: Dec 2013
07-17-2015, 10:09 PM
(This post was last modified: 07-17-2015, 10:57 PM by dabore.)
display lists? wasn't that for the oldschool primitive assembly from all the gl_things? baking the resource package? a valid optimization. bypassing a lil bit of pcie latency when sending bigger packages of geometry. the content. and command list are more of rendering setup commands. i'm only guessing logically tho it basicly records howto connect resources and what shaders are used to compute the output. i'd plug that logic on a sorta material or mesh resource level or how much whenever to buffer and render timely efficiently. imo it depends how you organize the resources needed to compute the output. they all gotta be there. and the resource packing itself and uploading is still something else... i guess?
thinking in program flow: i guess for pcsx2's geometry generator you have to bake both at the same... right? assemble resources and record the state machine. break the list when a state occurs that's not in the shader combo? or can you switch programs inside of the list? then there's barely limits. sure gotta take care of the resources. maybe multithread texture uploads or generally new resources before you can send the compiled command list to execute. a very content dependant chain. the resources gotta be done transfering when the command list is finished compiling or executed. this' the possible stall point i presume. i dunno where the resource lock has to and the stall would happen. cause...
i wonder if one could bake and send the command list with created but potentially incomplete resources. if i'd have a texture created but still in the threaded upload queue. the resource might possibly not have been locked to stall the list compilation there (for speed). then i would render the list like that. incomplete. it's worth a fun test. but... i'm not a ogl guy.
i'm sure you (@gregory) know that specific pcsx2 ogl code better then me anyway.
Posts: 8.598
Threads: 105
Joined: May 2014
Reputation:
168
Location: 127.0.0.1
(07-17-2015, 06:32 PM)xemnas99 Wrote: I used to explain this. Anyway, Direct3D11 can send multiple commands at once as well. You can read this: Command List.
Yeah but, DX12 does the job better by using Execute indirect. you should take a look at this:
http://www.dsogaming.com/news/directx-12...cpu-usage/
We're supposed to be working as a team, if we aren't helping and suggesting things to each other, we aren't working as a team.
- Refraction
Posts: 3.526
Threads: 6
Joined: Dec 2013
(07-18-2015, 02:57 PM)gregory Wrote: You know, openGL (4.3) has this feature since 3 years already (or at least 90% of it since I don't know the details). The feature is called GL_ARB_multi_draw_indirect. I don't think it can be useful for emulation anyway.
you don't think? i don't think that either tho. the thought is having constantly changing resources in emulators. that draw multi looks like a record of a compiled effect with fixed resources. like a fixed mesh that render the same all the time. you can effectively store that whole rendering chain of commands and stuff and the resources on the gpu. then run that computation per single command. you can also use it for particle systems. just simulate a step with a single call. every resource is on the gpu. and the shaders gotta manage to not flow over resources if it's a dynamic particle system.
Posts: 20.326
Threads: 405
Joined: Aug 2005
Reputation:
554
Location: England
I think (from what little i know) is we don't really know what the game is going to want to do in advance, so we are a bit of a slave to the game, plus the drawn stuff gets reused it the proceeding draws so they need to be done in advance, whereas a PC game you know what you're trying to do from the get-go, so multiple calls are easy to throw in one go.
Posts: 6.069
Threads: 68
Joined: May 2010
Reputation:
167
Location: Grenoble, France
Yes the issus is the game engine. For example let's take God of War. To compute the field effect, they do a 3 shader-passes. However to reduce gs dram access they don't do it once for the fullscreen but once for each 64 pixel width column. Therefore the effect uses 24 draw calls. Worse they change the render target 2 times for the effects (it means a flush of the gpu, they forget to tell you that you can send lots of draw call but without changing the target). Even worse one of the input texture is the the previous depth buffer so you need to convert it to a color format before the draw. (1*8 extra draw calls).
Normal games will do the fullscreen directly (and likely in 1 or 2 step). Just patch the game do use a real fullscreen rendering and you gain 10/20 fps easily. Sadly lots of games split the rendering in small parts.
Metal Gears Solid split the rendering in small block of 64×32 pixels.
Posts: 6.069
Threads: 68
Joined: May 2010
Reputation:
167
Location: Grenoble, France
Actually the issue isn't to emulate the basic game rendering, often this part is quite fast already. The issue is post processing effect. GS isn't programmable as a shader so they emulate lots of things with hack. For example you can't swizzle vector so they do some special memory conversion to support it. A swizzle on the gpu is free (often 0 cycle) whereas special memory conversion is awfully ultra costly (20-40 fps!).