Thought I'd try and clear things up a bit:
The two main obstacles in getting perfectly accurate FPU/VU emulation are:
A) Actually figuring out the exact behaviour of the FPU instructions on a PS2. I gave this a stab some years back and figured out add/sub/max/min, but mul/div/sqrt/rsqrt eluded me. (All of those produce results not always identical to what standard IEEE fpus produce).
Maybe someone else can figure these out...
(Note: the VU instructions most likely behave the same as the FPU ones, just operating on a vector)
B) Once (If) you have the accurate FPU implementation, you have to worry about the performance. For the FPU, this is negligible. But an accurate VU implementation would almost definitely need to a large amount of work on each vector element separately (not in parallel), seriously hurting performance. (Though hardware always marches forward, so don't let this stop you!)
As of several years ago (and it doesn't look like much has changed?), the most accurate (least inaccurate) FPU implementation sits in
iFPUd.cpp (notice the "d").
It emits rather than executes code, though, and is based on IEEE fpu instructions rather than doing things fully in software, so it's really much harder to understand. But it has comments giving background on the algorithms so it's not too bad (I think!).
An (interpreted?) software FPU core would be a nice addition, but do realize that the only advantage of it is that it'd be a lot easier to read, study and modify.
Just by writing a software FPU core, issue
(A) won't magically disappear.
Still, it could be a promising first step if one's serious about doing
(A), and might help prepare you for it.