Only SIMD is for CPU's and SSE2/3 only support 4 data streams at 1 point while modern GPUs support 32 (the G80 supports 64 doesn't it ?
Well but the G80 is no CPU therefor ... its quite different.
I think the whole reorder, pipeline, SIMD stuff in todays processors is a big waste.
In how many percent all of the 4 integer units of a Core2 can be fad with a single instruction stream - I guess not in even 1% of total execution time all 4 units work - wasted silicon.
I know there are things that are hard to do in paralell programming, but I think a design with e.g. one Core2-Core and one Sun-T1 design side-by-side would be optimal.