Linux patches review
#11
(06-07-2010, 06:56 PM)gregory Wrote: + inline fix: I take the hint of gigaherz. Duplicate the clamp_mix function that cause the failure. With the keyword static gcc is happy and do the inline. However I found an interesting behavior, clamp_mix function (before & after patches) are not inlined in Reverb.cpp. I try to also declare the clamp function (on reverb.cpp) static but it causes some inline error in others places... Opinion are welcome.

This is a limitation of GCC: it does not have global optimizations support (also known as link-time code generation (LTCG)). It will only be able to inline the code within defined modules.

... which means that 90% of the __forceinline specifiers in CSX2 and plugins don't do anything on GCC. I've known this since day 1, but there isn't much I can do about it, because there's no way I'm putting 90% of pcsx2 into one gigantic .cpp/.h buildfile. Wink

I like to keep code out of header files as much as possible because MSVC's debugger gets progressively slower about setting breakpoints and tracing into code, the more times code is included into separate cpp files. Tracing into std::containers can take several seconds even on my good machine (20+ on my lappy), and setting/deleting a breakpoint takes 15+ seconds (over a min on the lappy). Additionally, the more code in header files, the more likely Visual C's intellisense can go buggy and stop working.

Initially I tried to keep a policy of leaving small functions in headers and leaving larger ones in CPP, but PCSX2's so large that it became overwhelming and was getting increasingly difficult for me to remember/decide if I should look for a function's implementation in a cpp or an h, and I frequently ended up with two problem cases:

1. a dozen or more functions of related action, some large and some small, and forced to decide if I should just axe the cpp altogether or axe the .h altogether (since having them split across 2 files were awkward).

2. forward declarations. Places where I have 2 classes inter-dependent on each other, so they'd still need to be fully defined in separate class declarations and method definitions (with implementations typically defined in .inl's, and then specially included after the .h's so that dependencies are properly resolved). This is actually a problem on MSVC, because it causes __forceinline to generate compiler errors. (sigh) So it would fix GCC inlining but break MSVC from even being able to compile.

So finally I decided to bank on a theory:

Linux builds at the time barely worked anyway. It'd prolly be a year before it did. I figured most likely GCC would support LTCG by then; so I favored good C++ design, separating definition and implementation whenever possible. That was more than two years ago.

Unfortunately the 15-25% improvements in speed seen by using LTCG on.. well.. just about any type of program, just isn't enough to motivate the GCC gurus to implement LTCG. They would much rather spend their days implementing things like a 16-byte aligned stack (which is actually slower on x86-32, in addition to being asinine stupid on a few other levels), and then coding tens of thousands of lines of heuristics analysis in order to generate a couple XMM instructions here or there.

Conclusion:

After forcing me to spend a few weeks of my life to backward engineer their undocumented piece of glorified nonsense called "aligned stack on x86-32", I no longer have any will or desire to sacrifice my preferred coding styles for their foolish non-implementations. You're welcome to move clamp_mix to a header file so that GCFail can inline it, but I might just undo it at any time, willingly and without remorse. Wink

Maybe you could do like w32pthreads did for mingw: create a set of "cpp groups" that are basically things like:

#include "mixer.cpp"
#include "adsr.cpp"
...etc.

So long as the cpp files don't create local-scope macros or static variables that are non-unique in name, it should work ok (and generally I try to avoid using non-unique static var names, out of principle -- so there shouldn't be many of them -- if at all).

Oh, and another option!

Ask the GCC gurus for LTCG support! It's a good option if you enjoy being insulted and spat upon anyway.
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply

Sponsored links

#12
(06-07-2010, 06:56 PM)gregory Wrote: + gamefixes_like_speedhach: I join a screenshot of the new interface. I move the check box at the top and change heading message. It is now more similar to speedhack panels.

Sounds good. Though I think you forgot to actually attach any patches or screenshots? Smile
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#13
Ok. Thanks very interesting. I agree it is probably better to keep the code clean. There is plenty of others things to do anyway.

A question regarding LTCG. The new gcc version (4.5) have some new options that seem similar. Well for the moment the options is experimental (in the sense do not offer any speed improvement but ). Its it the same ? 4.6 will probably come next year with good support Smile, yeah I am still dreaming.

Changelog extract

A new link-time optimizer has been added (-flto). When this option is used, GCC generates a bytecode representation of each input file and writes it to special ELF sections in each object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit. This enables interprocedural optimizations to work across different files (and even different languages), potentially improving the performance of the generated code. To use the link-timer optimizer, -flto needs to be specified at compile time and during the final link. If the program does not require any symbols to be exported, it is possible to combine -flto and the experimental -fwhopr with -fwhole-program to allow the interprocedural optimizers to use more aggressive assumptions.
Reply
#14
Quote:Sounds good. Though I think you forgot to actually attach any patches or screenshots?

Sorry Blush Classical case !
Reply
#15
(06-08-2010, 12:54 PM)gregory Wrote: Ok. Thanks very interesting. I agree it is probably better to keep the code clean. There is plenty of others things to do anyway.

A question regarding LTCG. The new gcc version (4.5) have some new options that seem similar. Well for the moment the options is experimental (in the sense do not offer any speed improvement but ). Its it the same ? 4.6 will probably come next year with good support Smile, yeah I am still dreaming.

Yup, that's LTCG / Whole Program optimization. Excellent news, really. Last I heard they wrote it off saying "if you want whole program optimization then paste your whole program into a single cpp file." lol
ok, a note on the Gamefixes patch: Can probably comment out or remove the static line. It's sort of redundant with the Gamefixes StaticBox. Other panels have multiple static boxes and such, so it makes more sense there. Other than that, it looks perfect. Smile
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#16
Ok I remove it and commit it.

For the inlining, I will just add a comment explaining the situtation, not worth too change it for 2 real inline functions.
Reply
#17
Some patches on gsnull plugins.

+ gsnull_setting.patch: uses setting dir API and handle logging option. It is now working similar to others null plugins (at least usb, fw, dev9). After this patches we can remove the old glade dummy interface.

+ gsnull_log_api.patch: Uses log dir API. Similar to the previous patch gs log api.

Impact on windows are small, but it would not be a bad things to be tested Smile

Feedback as always are welcome Smile
Reply
#18
I don't really think either of those patches will cause issues. And GSnull's windows port is a little minimal, since I wrote GSnull. In this case I'd say to just go ahead and commit it, and I'll take care of it if the windows versions breaks.

I do really want to get all traces of glade out of the trunk, since they require an obsolete, depreciated version of glade to maintain. (And the code the other null plugins use wasn't there at the time; I added that, but forgot to move GSnull over to it...)
Reply
#19
Ok.

I will do more or less the same for SPU2NULL and PADnull. So we can remove old deprecated glade stuff and have some consistency between null plugins.
Reply
#20
Just a minor question regarding attribute of API functions. Some use EXPORT_C(type) function and others use type CALLBACK function ! Which one is better ?
Reply




Users browsing this thread: 1 Guest(s)