Optimization Request
#11
So I'm guessing that I've already optimally configured PCSX2? .-.

I was looking for a new processor and found one I liked, but then was informed that it needs to have the same socket as my motherboard - whatever that means. I'm not even going to bother, lol.
Reply

Sponsored links

#12
As long as you buy an AMD Phenom CPU , you're fine. All Phenoms share the same basic socket design. If your motherboard is Socket AM2 (and not AM2+), then it might mot be able to run high ghz cpus at their full capability. But if your PC is relatively new it should be AM2+.

The other problem is that one of the things PCSX2 relies on heavily is poorly supported by AMD. This cpu feature is called denormals are zero (DAZ), and is a mode of operation for the SSE/SIMD instruction sets. AMD and Intel cpus both support the mode, as its required by SSE specification. AMD's implementation, however, is notably slower than Intel's implementation. Many PS2 games use DAZ stuff a lot, and tend to run even slower than expected on AMD processors because of this.
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#13
I'm looking to buy the left processor:

http://www.newegg.com/Product/Productcom...103-644-TS

Is there anything I need to worry about? I have a Socket AM2+.
Reply
#14
@Air

how about the "Bulldozer" the new microarcetecture/codename for AMD to be relesed in 2011.
i read in wiki that thing will add up the Intel's moderns SSE's like 4.1 and 4.2 but the TDP is definately that high.
http://en.wikipedia.org/wiki/Bulldozer_(processor)
Main PC1:i5-4670,HD7770(Active!)
Main PC2:i5-11600K,GTX1660Ti(Active!)
PCSX2 Discord server IGN:smartstrike
PCSX2 version uses:Custom compiled build 1.7.0 64-bit(to be update regularly)
smartstk's YouTube Channel
Reply
#15
(11-11-2010, 02:36 AM)tallbender Wrote: @Air

how about the "Bulldozer" the new microarcetecture/codename for AMD to be relesed in 2011.
i read in wiki that thing will add up the Intel's moderns SSE's like 4.1 and 4.2 but the TDP is definately that high.
http://en.wikipedia.org/wiki/Bulldozer_(processor)

It's a completely new design. Its dangerous to make assumptions until we get to see one running a variety of code, and some obsessed optimizer like Agner Fog does per-instruction and low-level pipeline analysis on the new design. However, I'll will make some notes on the marketing analysis. First is AMD's Cluster Multi-Threading (CMT):

Quote:
  • Two tightly coupled, "conventional" x86 out-of-order processing engines which AMD internally calls modules.

  • duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications

CMT is AMD's version of Intel's HyperThreading. In HyperThreading (intel), each core tries to maximize its use of the 4 to 6 internal execution units by executing a second thread's instructions in unused execution units. The second thread will see action whenever the main thread's instruction chain stalls waiting for one of the following:
  • memory access
  • especially slow instruction (a division, for example)
  • a register dependency that prevents the CPU's "out-of-order" execution engine from being able to run a full 4-5 instructions in parallel.

Most of those conditions happen a lot in most apps, and so HT typically has lots of places where it can fill empty execution units of the CPU with tasks from other threads.

AMD's Cluster Multi-Threading (CMT) works differently, and is broken down into two parts. The first part is that cores are initially bound together as modules, which allows the core count of the CPU to be a lot higher without having to enlarge the die. Each module of a Bulldozer chip is similar to the original dual-core Athlons in a sense: two very tightly coupled CPU cores that share almost everything except instruction decoder and execution units. Though I'm unclear on the L1 cache situation -- there appear to be two levels of L1 cache on Bulldozer (one for each core and one for the module as a whole) and I'm not sure how those inter-operate.

DECEPTION ALERT: This means that an octa-core Bulldozer chip will not have eight true cores!! The chip will in fact have four modules, with each pair of cores in each module competing for a lot of shared resources. Some of you might remember how lousy the original Athlon X2's were at running 2 threads in parallel. This may not be significantly different. I can't predict the exact per-core performance rating, but I can be certain that it will be somewhat below the current generation of multi-core CPUs.

The second part of CMT is that there are actually two ALU/AGU pairs per core. The ALU/AGU handles integer and address operations only, which means that this portion of Bulldozer will do little to improve the performance of SIMD-heavy activities (such as encoding videos, image processing, most threaded game logic, etc). The AGU at least is of minimal use to such tasks, but will also be limited by the system's memory bandwidth (which should be pretty well hosed anyway, once you have 8+ threads all trying to access your system's RAM -- a few extra execution units aren't much help when half of the threads are sitting around waiting for their turn to access RAM). Furthermore, if you're running fewer threads than the number of cores on your CPU (which will happen like 98% of the time on an octa-core Bulldozer), the extra ALU/AGU pair of each core will be completely unused.

End result: With the exception of good old integer math, expect Bulldozer to provide increasingly diminished returns as you add up actual threads of work to a job. An octa-core Bulldozer should do exceptional running quad-core tasks; be markedly less stellar with 8-thread tasks; and will probably be sorely disappointing for anything except a few specific integer-heavy tasks when running 12+ threads.

Next is the new AVX and old SSE units:

Quote:
  • Two symmetrical 128-bit FMAC (fused multiply-add (FMA) capability) Floating Point Pipelines per module that can be unified into one large 256-bit wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software.

Ok, what this means is that the new AVX unit will be developed with 128-bit FMAC in mind, an instruction that does not yet exist on any current market CPU. Using that instruction liberally will be highly beneficial. Speed improvements on existing SSE-based apps are likely not very stellar, though that depends on how the FPPs are implemented (there's no indication how the FPPs will compare to AMD's existing ones, but I doubt they'll be bothering with optimizing it much).

This also means that using 256 bit instructions will not be notably faster than using 128-bit instructions, since 256 bit instructions can only run one-at-a-time while 128 bit ops can run two in parallel. This means Bulldozer will be a lot like the original Athlon and P3/early P4 chips that typically ran 64 bit MMX instructions much faster than 128-bit SSE instructions. Hopefully AMD will fix that in a later revision of Bulldozer, and give us real 256-bit SIMD support.
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#16
After thought about it, I'll also make note that Bulldozer's threading/core model is very complex, and I fully expect it will take a year for AMD to release kernel drivers for Windows that utilize the chip properly. Complex unbalanced multi-core designs like Bulldozer require fancy thread scheduling logic (based on extra fancy thread behavioral analysis!) in order to make sure everything runs smoothly. This was one of the problems with Intel's initial release of HT on the P4, for example. The operating system didn't schedule it properly initially, and it caused potentially serious performance issues that made HT look really bad (which were later remedied). Bulldozer is essentially a double-dose of HT-style parallel processing, with different strengths and weaknesses.


In all, my synopsis of Bulldozer (based on current pre-release specs) is as follows:

AMD is emphasizing parallel processing over single-thread execution speed, which I'm not a fan of. There are some serious issues with trying to scale software across 8-16 threads efficiently, especially when half to 3/4ths of those threads are getting shafted in the shared cpu and system resources department. Its hard enough for a programmer to schedule parallel tasks to run efficiently and without needing crap-loads of RAM to begin with, without having to worry over whether or not the thread's going to get stuck on a slow half-faked 'core' of the CPU. That sort of thing can muck up the whole chain of dependencies and cause the system to run slower than it would have if it were running less threads in the first place. >_<

Meanwhile, Intel has changed their own tune, and is trying to emphasize "turbo-charged" single thread execution schemes. I firmly believe this is a better strategy, especially for emulators (my own bias), but also for most games as well.

There's simply no substitute for raw single-task execution speed. Multithreading is great in its own ways, but you can not keep throwing more cores at a problem. At a certain point, you'll need a single thread to execute a series of dependent logic as fast as it can, until it can get to the next batch of parallel tasks -- and that's where Intel's i7 turbo boost excels, and where Bulldozer will apparently falter badly (unless AMD plans to have 4-5ghz varieties of these chips, which I highly doubt).
Jake Stine (Air) - Programmer - PCSX2 Dev Team
Reply
#17
My poor thread got hijacked. :<
Reply
#18
HEY GUYS! Look what I did all by myself :3


Attached Files Thumbnail(s)
   
Reply
#19
Nice, got great speedups yet?
Core i5 3570k -- Geforce GTX 670  --  Windows 7 x64
Reply
#20
(11-13-2010, 03:24 AM)MikeSugs Wrote: HEY GUYS! Look what I did all by myself :3

Did you just overvolt the CPU without overclocking? Not nice, not to mention totally useless...
Reply




Users browsing this thread: 1 Guest(s)