Just curious why ZOE2 is till slow
#11
Perhaps it's a GSdx bottleneck.

You can try out this build.It's still not merged in to master but it might help you out.
https://ci.appveyor.com/project/gregory3.../artifacts
CPU: I7-4770 3.9GHZ
Motherboard: Asrock B85M - DGS
RAM: Hyper X Savage 2x8GB 1.6GHZ CL9
GPU: GTX1070 8GB GDDR5
OS: Windows 10 Pro 64bit
Reply

Sponsored links

#12
I forgot that I limit my cpu speed this morning while tring something so now when I reastore it back to 100%,I get higher speed(on that screen 47-48fps)
With that edited GSdx I get 50-52fps in the pause screen
Reply
#13
If I'm correct, the main performance issue is the handling of texture.

The game uploads a single 1024x1024 mega-texture. The game switches the palette when a game want to use a new sub-part of the texture.

In 32 bits mode, a new palette is translated as convert the 1024x1024 texture and reupload it. CPU suffers (format conversion) and the PCIe bandwidth too.

In 8 bits mode, a new palette is translated as upload a new palette. It is still heavy on CPU as uploading a new palette every draw is costly. And we need to compare new palette with older palettes.

Conclusion, we use the hypothesis that palette and texture is a pair which allow some optimizations. However ZoE uses multiple palette by texture which is very bad for GSdx.
Reply
#14
(08-21-2017, 11:24 AM)gregory Wrote: extended explaination.

interesting. a mega texture? and it never changes while the level is running? is this megatexture just one format? is it 8 bit? or 4 bit?

(*talks to self* why am asking those questions tho?)

(i have a zoe2 iso.) if i would like to start working on this issue, where would/should i start optimizing this one scenario without breaking other games? how can i visually debug the mega texture and the palettes?
Reply
#15
(08-21-2017, 05:40 PM)dabore Wrote: interesting. a mega texture? and it never changes while the level is running? is this megatexture just one format? is it 8 bit? or 4 bit?

(*talks to self* why am asking those questions tho?)

(i have a zoe2 iso.) if i would like to start working on this issue, where would/should i start optimizing this one scenario without breaking other games? how can i visually debug the mega texture and the palettes?

Hum it is hard to explain. For GSdx a GS texture is a start address + a size + a format (which map to a standard GPU texture). However on the GS side, it is quite different, the texture is fully mutable. What you can do (note that I'm not 100% sure the game does that but it is a valid pattern)
* configure the texture register once
   + Start address 0 (or anywhere actually)
   + Size of 1024x1024 (basically a big chunk of the GS memory)

Then you can access a sub-part of the texture
* Configure the texture format likely 8 bits, but could be 4 bits, or RGBA. Each sub-part of the texture can have its own texture format, nothing forbid it
* Configure the sampler unit boundary (for example you can sample between [128;256], after 256 you can wrap at 128 or clamp)

The EE can write anywhere in the memory with any format. So you're free to upload a new sub-texture with a new format. (For us it means throw everything)

If you want to debug it, you need to spot a bad place, create a gsdump, replay it inside a GPU debugger. (you can also dump all "textures" from GSdx/gsdump)
Reply
#16
yeh. i get the idea. basicly linear memory. gs are shuffled tiles tho (that bugs me). i get this format thing. it's interpreting the raw bytes as the specified texture format when sampling. something like that... right?!?

i gotta google how sampler boundaries work. dunno yet.

and the "throw everything" means what exactly? you write the whole thing again into another full formated texture plane?!?

yeh. gotta try to figure it out. (i'd need a bigger monitor tho. 1600x900 can't display a 1k texture entirely). it could use a debug screen of the full 1k. with tinted writes maybe.

just build this crap. could this sorta debug work in gsdx?

[Image: RaiRpcP.png]

a manual mega texture. actually everything should be greyscale. and the palettes are missing. but... i guess linear memory doesn't show up like that... does it?!? that's the bummer. it would help. i dunno how i'd do the wrap state on the broken mesh tho. i'd guess they don't do that, for example?!? or opengl can do that sort of texture coordinate "subrect" wrapping? i haven't found it. texture arrays seem to work entirely different. that's odd. anyhow... that should work in linear space. it should be wrapping the texel lines. but how is the local [0.0, 0.0] texture subrect offset shifted? that'd be cool to know and have. but gsdx coordinates are entirely different. ugh.

rendering should be just... set pal, set texture sampler subrect. pointsampling 2 indices and lerp the two pallete output colors for bilinear. anisotrophy won't do that.

overwriting in linear is just writing into it and use the start pointer and size. how is the overlap managed tho? if you overwrite at the end, the next texture could be invalid. if working in this 2d texture space the upload subrect window could be defined. this is all sorts of nasty.

anyway... now... alil bit of everything of "i dunno why it does that". gsdx is seriously pegged pretty much all the time. odd.

http://imgur.com/a/7OEkd
Reply
#17
GS memory isn't linear so you need to transform it before the upload to the GPU. It is half the job of the texture cache. Avoid as much as possible those format conversions.

If a texture B is uploaded (from EE to GS), and partially overwrite texture A, you can assume that texture A is useless. And you will only need to upload texture B. The assumption isn't perfect here.

Anyway, what should be done is to build a separate cache for the palette. So instead to upload a new palette every draw call. We could just bind a new one (which is 10x-100x time faster than a texture transfer).
Reply
#18
i thought about that too yesternight. uploads always lock the resources for modification. hence it can stalls the pipeline. the gpu can't use locked resources for processing. that'd need a collision free memory array. a "cache" of palette "lines".

what's the general layout of palettes in the memory? are they continous "lines" or tiles? cause...

probably thinking the same here in another language...

thinking about caching/uploading: i thought a lil deeper in the metal. you could fully implement the "cache" on the gpu. cram the straight palette uploads (1k buffers) into the command stream as a payload. like a gs image transfer. into a "palette line buffer". this should be a rendertarget or just a surface that's entirely under gpu control. the pallete plotted by the gpu memory controller. the upload and writes should be rather fast. it's pci-e. bunch of GB/s. and just 1k. ofc my idea is doing a software check could slow down execution and build of the command stream and setup of the gpu shader for rendering. fast brute force might be faster. that's my pov. that thing gotta be tested. i will try. Smile

in case of syncronisation, the palette should be really done writing fast, before the gpu starts computing primitives to use them. i dunno if this would collide in any way. the driver may detect collisions when it's using a updating palette while processing "older" primitives. command stream micro timing when writing/modifying, that'd need to be synced perhaps. i dunno who should do it. the driver?
Reply
#19
All GS memory isn't linear. Included palette.

You imagine way to complex thing Smile I don't think OpenGL lock resources like that. Uploading some data is far more complex than just take resource/write into it. First you need to ensure format correctness, you need to setup DMA, and then convert the texture info the GPU format (internally GPU aren't not necessary linear). A bind is a couple of validation + a pointer in the command buffer.

Eventually I think it should be possible to optimize the resources state change. But I don't think it is the main issue. An idea could be to store all texture information (texture/palette pointer, GS size, etc...) into a constant buffer. One constant buffer by texture. Speaking of constant buffer, we need to upload the GS texture sampling parameter too. As the game change often the sampler, it might cost too much buffer upload.
Reply
#20
yeh. maybe it's cause i'm lookng from a dx9 standpoint. in dx11 and ogl the locks and "stalls" are (maybe just) hidden inside of the api functions. i've no idea. i don't and don't wanna debug the api or os functions.

in terms of debugging, me reading the gs code just gone nowhere. i got no clue, so... well... i think i gotta redo my lil custom dump gui mod. i had one years ago, that i could partially read and "use". i gotta have a look at the amount and positions of the data. which i haven't gotten yet.
Reply




Users browsing this thread: 1 Guest(s)