Don't guess!
Unrelated, I've been experimenting with the iso-gzip stuff lately.
The current code (in git or the exe attached at the first post) may occasionally decompress at 8M chunks, which is not fun. I got this solved locally already by making the access code more efficient, with zero penalty. I'll push it to git soon.
I also managed to decompress in a different thread, but decompressing 4M chunks can still take longer than I'd like (which may include the OS HDD seek time as well, though the standard non-gz iso reader may also suffers from this). So even with a thread games may still wait for the data. So threading improves several cases but doesn't solve everything. I'll probably push the threading code to git at some stage.
While on the subject of disk access in a thread, prefetching is also possible, such that if PCSX2 "sees" that the game disk access pattern is such that soon it will need to decompress more data, it will start to decompress that data in a thread before the game actually asks for it. But for this to work, the "guessing" that it will need to extract more soon should be pretty good, otherwise, it may extract stuff which the game will never actually request, resulting in more cpu load than required. I still didn't experiment with prefetching.
I also got an idea to decompress in much smaller chunks (256k or 512k or even 128K or smaller) such that each decompression is practically instant and the game will not notice the delay.
The basic problem here is that the index is optimized for quick access in specific intervals, and it's optimal to extract data between these intervals (which right now are set to 4M). So every access of even 2k right now requires extraction of 4M (though all the following requests within this 4M chunk will be cached already and therefore will not require further extraction and will happen instantly). And 4M can still be slower than the game is happy to wait for.
Making these intervals smaller will allow access in higher resolution and smaller intervals, and each extraction would be faster. But an index optimized for smaller access intervals is also bigger. So if I'd set the intervals to 256K, then the index size would be 16x bigger (compared to 4M intervals), but the extraction speed of each chunk would also be 16x faster so the game will probably not notice any delay at all. But 16x bigger index is very likely to be too big for anyone to like (and negate most of the compression space saving).
So my idea here is to keep the intervals big (and the index size small), but once a game requests a chunk (e.g. of 2.5K which is roughly a DVD sector), instead of extracting 4M, it'll extract e.g. 256K BUT it'll also remember it's internal state such that IF the next access request is for the chunk right after the extracted 256K, it would extract as fast as if the index had 256K intervals. It's like storing another tiny index which is only useful for direct access to the next chunk of 256K. But the good thing is that most of the time, this is exactly the extraction which we'll need.
So, I didn't try it yet, but I think it's possible.
Also, if anyone wonders if it's worth the time, then the answer is that it's fun to solve these problems, so yes for me.