A speed hack
Current situation is
CPU send command N => wait => GPU process previous command => wait ... => finally GPU process command N => CPU reads back the framebuffer.
Due to the "read back", you need to serialize the access. In normal situation, you don't need to wait the processing of the command.
Faster hardware will help (PCIe4 + Faster GPU) to reduce the latency of operation, less wait which means more FPS.
Anther solution is possible but more tricky to implement. Baldur Gate (same game engine) re-upload the texture after the read back. So instead of
FrameBuffer => CPU mem => GPU texture
We could do
FrameBuffer => GPU texture (very fast)
=> asynchronously read the texture to the CPU memory. (speed impact depends if you really need the data on the CPU memory)
As a side note potentially we download the full framebuffer whereas it must be possible (with not old HW) to extract only the interesting part. But synchronous GPU => CPU transfers are slow, even on the best machine.