If PCSX2 is to be Rewritten, What should be Done Differently?
#11
(02-26-2016, 06:24 PM)gregory Wrote: The kernel is just a piece of EE code.

I know:

(12-20-2015, 10:35 AM)K.F Wrote: The bios is just software, its' the most basic piece of software, but it is still software nonetheless, that still needs to go through the hardware that you will have, or will emulate anyway.

(02-26-2016, 06:24 PM)gregory Wrote: There is no DMAC/IPU on the kernel ;)

I don't know if this is a language barrier issue or what, but I guess I need to emphasize what I said:

K.F Wrote:bios includes the IOP modules, and it also handles the (EE - IOP) DMA channels (SIF)

I did not say it contains any hardware.

(02-26-2016, 06:24 PM)gregory Wrote: IOP overhead is rather low, so you won't get a magical speed boost with a threaded IOP. Maybe 5-10%, if you're not VU limited.

I just said it will be possible, I did not say it will give a magical boost. Actually, 5-10% is much bigger boost than what I though it would be.

And I still don't know what part of what I said makes me "don't know/understand what I'm talking about" :P
Reply

Sponsored links

#12
Threading IOP (and with it, SPU2-X) would be the last possible approach that we haven't done yet, that could lead to a speedup.
It's also very dangerous to do, as it makes emulation behavior non-deterministic.
That means that in one session, your game could mistime a DMA transfer and crash, and in the next session, the transfer completes correctly.
It makes such problems very hard to debug.
For that reason, some in the developer team have totally ruled out the idea.

We do have to revisit that EE/IOP sync code though, as it's currently overly complex and not very accurate, especially in conjunction with SPU2-X.
The game Klonoa 2 has no sound due to this, for example.

I just wouldn't count on the revised code to be also multithreaded. It might just be too dangerous.
Reply
#13
What about taking advantage of C++11 threading features such as futures for small repeated calculations
Reply
#14
(02-26-2016, 10:45 PM)K.F Wrote: I don't know if this is a language barrier issue or what, but I guess I need to emphasize what I said:


I did not say it contains any hardware.

The barrier is that I know the internal of the EE kernel. Yes there are DMA related code in the kernel, but it is only a write to a couple of register to start the transfer. Or get the register status. All the DMA emulation, and synchronization of the transfer are done outside of the kernel.

Quote:I just said it will be possible, I did not say it will give a magical boost. Actually, 5-10% is much bigger boost than what I though it would be.

And I still don't know what part of what I said makes me "don't know/understand what I'm talking about" Tongue
Well it is 5-10% if VU isn't limited and without taking into account the synchronization between threads. Actually it seems to be much lower. Here some number from God Of War     (speed hack enabled and VU cycle stealing)

As you can see IOP emulation isn't costly. Sound (spu2x) has a bigger impact.

I don't think you understand how work multithreading. You seem to mix the internal sync of the PS2, and the sync of x86 threads.

(02-27-2016, 06:01 AM)HTB123 Wrote: What about taking advantage of C++11 threading features such as futures for small repeated calculations
C++11 threads are only a wrapper around pthread already used by PCSX2.

The application already uses several threads. And the GPU driver is allowed to have some internal threads Wink
Code:
  8    Thread 0xeca88b40 (LWP 2200) "MTGS" 0xf7fdcb90 in __kernel_vsyscall ()
* 7    Thread 0xf1bffb40 (LWP 2199) "EE Core" 0x30083f47 in ?? ()
  5    Thread 0xf25ffb40 (LWP 2195) "MTVU" 0xf7fdcb90 in __kernel_vsyscall ()
  4    Thread 0xf2fffb40 (LWP 2189) "Redirect_Stderr" 0xf7fdcb90 in __kernel_vsyscall ()
  3    Thread 0xf39ffb40 (LWP 2187) "Redirect_Stdout" 0xf7fdcb90 in __kernel_vsyscall ()
  2    Thread 0xf4340b40 (LWP 2185) "SysExecutor" 0xf7fdcb90 in __kernel_vsyscall ()
  1    Thread 0xf6016a00 (LWP 2165) "PCSX2" 0xf7fdcb90 in __kernel_vsyscall ()
Reply
#15
(02-26-2016, 01:02 PM)K.F Wrote: It just means replacing emulated threading - several emulated processors on one real thread- with real threading - several emulated processors on multiple real threads -. Unless your talking about another part of that sentence?

For example, the HLE bios includes the IOP modules, and it also handles the (EE - IOP) DMA channels (SIF). If you could emulate those in high level, then that is all you need to run the IOP on another thread efficiently, unless I'm missing something. Sure, the IOP can be threaded without HLE bios, but I doubt the performance increase from that would be worth it, if there was any.

It's not easy like that. You have to convert all the code to be thread-safe without using synchronization, which may not be possible.
Reply
#16
Ok, great. Thank you everyone for your time and answers : )
Reply
#17
To answer the initial thread. What need to be done on GSdx is the replacement of float operation by integer operation. However, it means that
1/ You can't use HW texturing unit (no more anisotropic filtering)
2/ You can't use blending texturing unit
3/ Float operations are faster than integer operation.
Conclusion, GPU rendering will be much slower but more accurate (accurate blending is a preview of the accuracy).
Reply
#18
(02-28-2016, 06:01 PM)gregory Wrote: To answer the initial thread. What need to be done on GSdx is the replacement of float operation by integer operation. However, it means that
1/ You can't use HW texturing unit (no more anisotropic filtering)
2/ You can't use blending texturing unit
3/ Float operations are faster than integer operation.
Conclusion, GPU rendering will be much slower but more accurate (accurate blending is a preview of the accuracy).

Is it possible to implement this as an option?
Reply
#19
An option won't be possible. It will be a complete different renderer.
Reply
#20
isn't point sampling basicly integer? for that blend math... i haven't looked at the code ever since... you tried signed rgb or hdr rendertargets? it consumes more memory but...
Reply




Users browsing this thread: 1 Guest(s)