re crystalsquid
Pairing? Ignoring lastest version of CPU? Oh dear, just again that ugly microoptimalizations for old CPU. It was a reason why Intel didn't improved it fast enough.
I like assembler & optimising at this level

If you know what you're doing you can usually double the speed of any decently large function (such as vector transform & perspectivise, software renderers etc.)
The insights from learning how to optimise can help you write C/Java code that runs at least 10% faster just by knowing a few tricks and how code is really translated - and once it becomes second nature then it doesn't cost any more development time - and done properly it rarely produces code that would appear in this thread

One clipping routine I came across had the comments:
1 2 3 4 5 6 7 8 9 10 11 12 13
|
|
It's not big, & its not clever
