Ooop sorry I meant perspective correct
It's not much different from affine texture mapping, where you interpolate u/v linear along the edges and for each scan line. The problem here is that u and v are linear in 3d, but not in screen space. By doing it that way, you'll get this Playstation 1-wobble-effect in the textures.
However, what is linear in screen space is "anything" divided by z. So instead of interpolating u and v, you interpolate u/z and v/z. To get the proper texture coordinates from this, you have to know the current z. z itself is again not linear in screen space. But 1/z is, so you interpolate three values: u/z, v/z and 1/z. To get u and v for each pixel, you simply divide u/z and v/z by 1/z...which is rather costly to be done per pixel. You can do a little optimzation here by doing this all 8/16/32/whatever... pixels only and do a linear interpolation between these correct values.
If done well, it's pretty fast: http://www.jpct.net/quapplet/