If PCSX2 is to be Rewritten, What should be Done Differently?
#1
Are there current roadblocks that could avoided if things were done in a different way?

-- HLE? is it feasible for the ps2? The two approaches that I can think of: just certain parts like the bios, an HLE bios alone will give a very big boost to the performance. The other approach is HLE of the whole API, which seems like a prohibitive amount of work, but it will reduce the spec requirement to nothing - even for mobile - and open a lot of room for other improvements.

-- 64-bit only? Sounds like the obvious choice now.

-- Utilize multiple cores (min 4?) for different ps2 processors? Assuming whole system HLE approach is not a practical option, then fully multi-threaded one should be. I know some parts in the ps2 are interconnected, and can't be separated, but building an emulator from the ground up with multithreading in mind should give better implantation options, especially with an HLE bios.

-- New low level API (dx12-Vulkan), all cores can reach the gpu directly, and now that direct compute is a common thing? Again, obvious.

Anything else that should be done differently in case of a rewrite?
Reply

Sponsored links

#2
I'll try and answer your questions best I can, of course others might have a better insight in to some of it (Gregory? Tongue )
(02-25-2016, 12:45 PM)K.F Wrote: Are there current roadblocks that could avoided if things were done in a different way?
The current roadblocks are really due to the current implementations of things. The whole recompiler and memory system were written to be optimised for 32bit performance without any consideration for expansion later on, so a lot of stuff is having to be rewritten to allow support for 64bit registers, memory pointers etc. I don't think there is anything which won't be possible to implement on the current setup, it will just need some work to undo some optimisations to improve on the system.

(02-25-2016, 12:45 PM)K.F Wrote: -- HLE? is it feasible for the ps2? The two approaches that I can think of: just certain parts like the bios, an HLE bios alone will give a very big boost to the performance. The other approach is HLE of the whole API, which seems like a prohibitive amount of work, but it will reduce the spec requirement to nothing - even for mobile - and open a lot of room for other improvements.

HLE is feasible, but only really for the BIOS, the system really acts nowhere near similar to the PC enough to do things in high level emulation and game programs are not predictable enough to write high level functions for. The BIOS however is pretty much set in stone, so HLE'ing that is possible, in fact we have done some test to a point and have had the BIOS partially HLE'd already and it works.

(02-25-2016, 12:45 PM)K.F Wrote: -- 64-bit only? Sounds like the obvious choice now.

Now 1.4.0 is out, that is the last release that will have true Windows XP support, we feel it's probably time to start moving on and 64bit will bring us some advantages like virtual memory mapping to improve the speed of memory access and other tricks, so yes, 1.6.0 will most likely be 64bit only. Do note however (for the people who think more bits == more speed) that there are no direct performance advantages from just going to 64bit, if anything it will be slower, but it's taking advantage of the features available in 64bit which will bring better speeds.

(02-25-2016, 12:45 PM)K.F Wrote: -- Utilize multiple cores (min 4?) for different ps2 processors? Assuming whole system HLE approach is not a practical option, then fully multi-threaded one should be. I know some parts in the ps2 are interconnected, and can't be separated, but building an emulator from the ground up with multithreading in mind should give better implantation options, especially with an HLE bios.

Multithreading the PS2 is a pig, it's either easily threadable but doesn't do much, or it's high utilization and can't be threaded without causing problems. The example of this is VU0 and the EE which interlock tightly, threading these will cause no end of issues with games, so we can't really do this, VU1 is already threaded and nothing else really uses a lot of power. The only thing I can "think" of which would be worth threading, is the entire IOP side, so the CDVD, SPU2, IOP processor and all the other smaller insignificant bits could possibly be threaded in to a separate thread, but then you get issues on memory access, currently they are all in the same memory management etc and that would have to be split out, things like the SIF would have to work across the thread boundary, kind of similar to how the PS2 does it, but this could cause timing issues.

(02-25-2016, 12:45 PM)K.F Wrote: -- New low level API (dx12-Vulkan), all cores can reach the gpu directly, and now that direct compute is a common thing? Again, obvious.

Gregory has said that this will make little to no difference, the number of drawcalls on the PS2 are very minimal compared to the Wii or PC games, so the actual advantage will probably be very small. There could be advantages however from the "metal" handling of it and the drawcall overhead may make a small boost, but we have bigger problems at the moment with the texture cache, we would benefit more from dealing with this before picking up a new API.

(02-25-2016, 12:45 PM)K.F Wrote: Anything else that should be done differently in case of a rewrite?

Kind of covered this above, can't think of anything off hand lol
[Image: ref-sig-anim.gif]

Reply
#3
HLE bios: there is 2 parts IOP and EE. IOP is cheap to emulate. The EE part is a light kernel, in your opinion, how much of the game is spent in the kernel? If it can help, the EE kernel is around 50 kilo bytes of code.

The biggest advantage of 64 bits is that code will be easier for computation. However x86 memory access will be a nightmare. And VU emulation will remain slow, maybe AVX2 could improve the perf a little but unlikely to be a game changer (do note that I barely know VU stuff).

Quote:-- New low level API (dx12-Vulkan), all cores can reach the gpu directly, and now that direct compute is a common thing? Again, obvious.
Quote:Gregory has said that this will make little to no difference, the number of drawcalls on the PS2 are very minimal compared to the Wii or PC games, so the actual advantage will probably be very small. There could be advantages however from the "metal" handling of it and the drawcall overhead may make a small boost, but we have bigger problems at the moment with the texture cache, we would benefit more from dealing with this before picking up a new API.
Current PC games doesn't have so much draw calls neither. But yes, few games has lots of draw call with various state change. However, the draw call cost is GSdx emulation + driver overhead. And driver validation is already done in a separate thread, so I'm not sure the overhead is that big. Sure it will help a bit to reduce it further, but I'm not sure it worth the extra complexity.

Actually the only obvious thing is that rewrite everything is not a good idea Wink
Reply
#4
This has kinda already been mentioned but my only suggestion would be better optimization of multicore processors if possible. Relying on single thread performance as heavily as PCSX2 does isn't really feasible. That's why there's always so many threads on the forums complaining about the performance of PCSX2.

I can imagine day in and day out it gets extremely frustrating for you guys particularly when you know how much work you've put into it.

But at the same time I think you gotta understand it from the perspective of other people running the program who in most cases don't have bad CPU's at all but maybe lack some of the single thread performance and if PCSX2 was better optimized for multicore processors that wouldn't be an issue at all.
Reply
#5
I wouldn't recommend a rewrite though unless you could find a way to do it without messing anything up along the way.
Reply
#6
(02-26-2016, 12:08 AM)[]HP[]Hawkeye Wrote: This has kinda already been mentioned but my only suggestion would be better optimization of multicore processors if possible. Relying on single thread performance as heavily as PCSX2 does isn't really feasible. That's why there's always so many threads on the forums complaining about the performance of PCSX2.

I can imagine day in and day out it gets extremely frustrating for you guys particularly when you know how much work you've put into it.

But at the same time I think you gotta understand it from the perspective of other people running the program who in most cases don't have bad CPU's at all but maybe lack some of the single thread performance and if PCSX2 was better optimized for multicore processors that wouldn't be an issue at all.

The main thing is though as refraction said - it's not just a simple matter of splitting things off onto other cores.

Even if we had insane manpower and could do a complete rewrite we'd still be relying on single threaded. That's just the nature of the game.

Our threads our EE, GS, and VU1 right now. Ref already explained why VU0 can't be split off without breaking the world. All the really leaves is the IOP. I don't understand it well enough to speak myself, but from what Ref said it seems like it might be feasible. But it would also be a lot of hard work for not that much gain.

And then you have the accuracy/speed tradeoff. Compare 0.9.6 to 1.4.0 and I bet you'll find the former to be a lot faster. But it's also a lot less compatible.

There is no easy answer to this question, but we've chosen the accuracy route and I think that's correct.
[Image: XTe1j6J.png]
Gaming Rig: Intel i7 6700k @ 4.8Ghz | GTX 1070 TI | 32GB RAM | 960GB(480GB+480GB RAID0) SSD | 2x 1TB HDD
Reply
#7
What I wanted with this thread is to organize the different major roads that pcsx2 can take, and what road it should take.

So far:

-- Complete HLE rewrite:
Not really an option for pcsx2, starting from zero going in a complete different direction, it's not even the same project anymore. Even "Play!" is just partial HLE, so if someone wants to do it, they're better off starting a new project I guess.

-- Partial rewrites: bios HLE and x64:
I guess this is the next logical step, they are already on the TODO list after all.

Huh? ... not as many options as I first thought there will be, either very dramatic changes, or not so. Not sure if that's a good thing or a bad thing, but it sure makes it a clear choice at least.

(02-25-2016, 01:57 PM)refraction Wrote: in fact we have done some test to a point and have had the BIOS partially HLE'd already and it works.

Is there anything out in the open, or are there legal crap? Or are you just referring to fps2bios.

(02-25-2016, 05:49 PM)gregory Wrote: in your opinion, how much of the game is spent in the kernel? If it can help, the EE kernel is around 50 kilo bytes of code.

According to Air, quite a lot. And, I'm not sure what the size has to do with anything? It's more about how often it's called, and that I do not know.

(02-26-2016, 12:32 AM)Blyss Sarania Wrote: Even if we had insane manpower and could do a complete rewrite we'd still be relying on single threaded. That's just the nature of the game.

Not if the bios was HLE, if I'm understanding correctly, a lot of the sync comes from the kernel, so if that was emulated in high level, then a lot of the emulated threading can be native too. Also, a dedicated sync core should - at least in theory - help, even in current state.
Reply
#8
50 KB is small, very small. It is a tiny micro-kernel that handles thread/semaphore/interrupt handler/couple of privilege registers. There is a single function that is often called it is the idle loop.
Code:
while(true)
     nop;
But there is a hack to reduce it already. I don't know how Air found 20/30%, it is surely the time spent in the idle thread (for not heavy game). Otherwise I'm sure, games don't spend 20-30% in the kernel. Kernel is too small, there is barely any loop into the kernel. Sony optimizes the kernel to be fast and light. Even a fat windows kernel, is nowhere near 25% of overhead.

Quote: Not if the bios was HLE, if I'm understanding correctly, a lot of the sync comes from the kernel, so if that was emulated in high level, then a lot of the emulated threading can be native too. Also, a dedicated sync core should - at least in theory - help, even in current state.
I'm sorry but you don't know/understand what you're talking about.
Reply
#9
(02-26-2016, 11:32 AM)gregory Wrote: But there is a hack to reduce it already. I don't know how Air found 20/30%, it is surely the time spent in the idle thread (for not heavy game). Otherwise I'm sure, games don't spend 20-30% in the kernel. Kernel is too small, there is barely any loop into the kernel. Sony optimizes the kernel to be fast and light. Even a fat windows kernel, is nowhere near 25% of overhead.

It did sound like an insane amount. I don't know if it something that has changed or what, his post was from 2009 after all, although a hack that gives 20 - 40 % increase in performance would not have come out without a fuss.

But then again, Sony optimized the kernel for the ps2, not for an emulating PC. What's fast for the ps2 and does not need optimization, does not necessarily mean it will be fast in PCSX2, e.g DMACs, fast dedicated chips for the ps2 that's almost free, so no need for optimization, but relatively slow when emulated and definitely not free for pcsx2 - unless pcsx2 has a hack for that too  :P  -.

(02-26-2016, 11:32 AM)gregory Wrote: I'm sorry but you don't know/understand what you're talking about.

It just means replacing emulated threading - several emulated processors on one real thread- with real threading - several emulated processors on multiple real threads -. Unless your talking about another part of that sentence?

For example, the HLE bios includes the IOP modules, and it also handles the (EE - IOP) DMA channels (SIF). If you could emulate those in high level, then that is all you need to run the IOP on another thread efficiently, unless I'm missing something. Sure, the IOP can be threaded without HLE bios, but I doubt the performance increase from that would be worth it, if there was any.

EDIT: Oh, I misread that as "I'm sorry but I don't know/understand what you're talking about.", that was a waste of time for an explanation then :P

Still, I'd LOVE to see what's your problem with that exactly.
Reply
#10
Note a 20% increase in EE recompilation doesn't translate as a 20% increase in general perf.

The kernel is just a piece of EE code. It isn't hard to emulate. There is no DMAC/IPU on the kernel Wink

The PS2 has 2 CPU that run in the same time. Games are designed in a way that read/write from 1 cpu to another cpu give correct results. Sometimes they uses dedicated sync primitives, sometimes they're lucky (aka just wait some time). The sync primitives are also used to execute the game instruction in the correct order.

The issue isn't to emulate the sync primitives. The issue is that both CPU must run in the "same time". Current behavior is to run a bit EE, then run a bit IOP, then EE .... (potentially there are both VU CPU in the mix).

IOP overhead is rather low, so you won't get a magical speed boost with a threaded IOP. Maybe 5-10%, if you're not VU limited.
Reply




Users browsing this thread: 1 Guest(s)