I'm trying to build a modded version of PCSX2 with EE being emulated at the full 300 MHz mostly just for giggles and to see just how badly my processor will scale. I think the code is from svn-r4177, although I'm not so sure. Anyways, it was checked out January 5, 2011 around 6AM (GMT+08:00).

(10-26-2010, 09:23 PM)rama Wrote: Gee, guys Tongue2
Current PCSX2 runs, by default, with an EE speed of ~240MHz.
It largely depends on the actual game code that is run though.

Saiki built PCSX2 for himself that has an EE speed of more than that, possibly ~400MHz

Saiki Wrote:
ilovejedd Wrote:You mentioned you compiled custom versions with the EE running at full speed. Would you happen to have a list of what changes to the source code need to be made for this? Thanks!
look for the speedhacks 2,2,2 is what you need one of them to be

The relevant code appears to be the following (iR5900-32.cpp):
static u32 scaleBlockCycles_helper()
    // Note: s_nBlockCycles is 3 bit fixed point.  Divide by 8 when done!

    // Let's not scale blocks under 5-ish cycles.  This fixes countless "problems"
    // caused by sync hacks and such, since games seem to care a lot more about
    // these small blocks having accurate cycle counts.

    if( s_nBlockCycles <= (5<<3) || (EmuConfig.Speedhacks.EECycleRate == 0) )
        return s_nBlockCycles >> 3;

    uint scalarLow, scalarMid, scalarHigh;

    // Note: larger blocks get a smaller scalar, to help keep
    // them from becoming "too fat" and delaying branch tests.

    switch( EmuConfig.Speedhacks.EECycleRate )
        case 0:    return s_nBlockCycles >> 3;

        case 1:        // Sync hack x1.5!
            scalarLow = 5;
            scalarMid = 7;
            scalarHigh = 5;

        case 2:        // Sync hack x2
            scalarLow = 7;
            scalarMid = 9;
            scalarHigh = 7;


    const u32 temp = s_nBlockCycles * (
        (s_nBlockCycles <= (10<<3)) ? scalarLow :
        ((s_nBlockCycles > (21<<3)) ? scalarHigh : scalarMid )

    return temp >> (3+2);

I'm trying to analyze what happens when scalarLow, scalarMid & scalarHigh are all set to 2. From what I see that's basically going to be equivalent to:
const u32 temp = s_nBlockCycles * 2;
return temp >> (3+2);

which is, if I'm doing the calculations correctly, essentially the same as:
return s_nBlockCycles >> 4;

Is that correct? I was wondering, instead of replacing the values for the scalars to 2, can I just go straight to return s_nBlockCycles >> 4;?

Also, instead of simply replacing one of the EE speedhacks, I'm thinking of adding this as an option in the GUI in my custom build. So far, I've found that I would need to change SpeedhacksPanel.cpp
Line 22:
const wxChar* Panels::SpeedHacksPanel::GetEEcycleSliderMsg( int val )        // add a new case for the experimental EE Cycle Rate?

Line 122:
    m_slider_eecycle = new wxSlider( eeSliderPanel, wxID_ANY, 1, 1, 3,        // change to 1, 1, 4?
        wxDefaultPosition, wxDefaultSize, wxHORIZONTAL | wxSL_AUTOTICKS | wxSL_LABELS );

and ini.cpp
Line 230:
    Entry("EECycleRate", Config.Hacks.EECycleRate);
    if (Config.Hacks.EECycleRate > 2)        // change this to 3?
        Config.Hacks.EECycleRate = 2;        // change this to 3?

Are those the only things I would have to change? Did I miss anything else? It's been several years since my Introduction to Computer Science class and that's pretty much the extent of my training so my programming skills are practically non-existant. I apologize for all the questions and thank you very much for any help!

Sponsored links

Without going too much into it: Yeah, could work.
Make your scalers 2,3,2 though or a few games won't boot (Yep, I tested this before Tongue2 )

Not sure if the gui will be fine with just those changes. Trial and error should quickly tell you Tongue2
Okay, thanks! I'll just make a copy of the trunk to compile separately for troubleshooting.
Are there any resources that cover exactly why emulating the EE at ~240 mhz works at full speed, even though it's clocked at 300 or so? It's extremely interesting that that works, and I'd like to know exactly how Smile A technical discussion about it would be excellent.
Basically, not all games use the EE's full power kinda like how CPU usage is different when you're just browsing the internet or watching a video as opposed to when you're playing a games or doing 3D rendering. So instead of allocating CPU cycles towards emulating EE, the CPU cycles can be allocated to other tasks such as VU, etc.
Our EE clocks aren't totally exact. Some common operations are very fast, others are too slow.
Depending on the game and situation, we have a 500, 300 and 200MHz EE.
It's just that on average our EE is too slow.
The reason for this being the EE has a "dual pipeline" mode, where it can execute 2 commands at once, which is extremely difficult to time.
so where for example a dual pipeline operation on a ps2 would take say, 5 cycles for 2 instructions (as they are 5 cycles each, or at least one of them) we actually count 10, which causes our EE to take longer than it should.

Cycle timing is a royal pain in the ass, so across time we've used what we consider to be an "average" cycle timing, its very difficult to get it right otherwise Tongue
[Image: ref_sig_anim.gif]
Like our Facebook Page and visit our Facebook Group!
(01-07-2011, 03:33 AM)Urisma Wrote: Are there any resources that cover exactly why emulating the EE at ~240 mhz works at full speed, even though it's clocked at 300 or so? It's extremely interesting that that works, and I'd like to know exactly how Smile A technical discussion about it would be excellent.

Our ee cycle counting implementations are just attempted guesses on how many cpu cycles blocks of code take.

The ee recompilers and interpreters are not cycle accurate so what we instead do is say, "we have a block of x number of instructions, on average an instruction takes y amount of time, so just multiply x * y to get the amount of cycles the block took."

A 'block' in a recompiler usually consists of an arbitrary starting instruction and then continues until a branch is detected (and then one additional instruction which is the branch delay slot; branch delay slots are used in some architectures to reduce pipeline stalls on branches potentially making the system faster).

The reason we're making guesses into the amount of cpu cycles the blocks take is because we can't really accurately predict the timing (well not quickly, and w/o more stuff emulated).

The main reason is the cpu cache. An instruction that accesses memory will take different amount of time depending on if that memory is in the cache or not. So because pcsx2 doesn't implement the cpu cache, it just guesses on the average timing the instructions take.

There are also pipeline stalls that effect cycle counting, and they can interact with cache-access timings as well.

Another problem is that cache timings cannot be computed at the recompilation phase, because they change on code execution.
That is when the recompiler is generating its code, it does not know how much time an instruction will take, the only time you know is when the code is executed. So a recompiler that handles cache timings will have to emit code to handle it at execution time (when the code that the recompiler generated is executed).
The more code that is run at execution time, the slower the emulator; hence one reason why making pcsx2 more accurate in this regard will be a slow-down.
Check out my blog: Trashcan of Code

Users browsing this thread: 1 Guest(s)