barfy
Junior Member  
The evidence of things not seen
|
 |
«
Posted
2004-06-23 10:00:59 » |
|
1 2 3 4 5 6 7 8 9
| int newpix = (pixel[index] & 0xFF) + (currentTextureMap[tIndex] & 0xFF); if(newpix > 255) newpix = 255; pixel[index] = (byte) newpix;
|
Can this section of code be made any faster? This is pretty crucial since this portion gets executed many times per frame. I was wondering if there's a way to limit the variable newpix (using bit operations?) so that it is always between the values 0-255 (if the value is greater than 255, than it should truncate to 255). Then I could simply discard that if statement which checks for overflow.
|
|
|
|
|
Herkules
|
 |
«
Reply #1 - Posted
2004-06-23 10:41:28 » |
|
1
| int newpix = (pixel[index] + currentTextureMap[tIndex]) & 0xFF; |
Hows that?
|
|
|
|
erikd
|
 |
«
Reply #2 - Posted
2004-06-23 10:43:41 » |
|
As far as I know, it can't be replaced by a simple bit operation but the if statement is very cheap anyway. The only possible speed optimization I can see is to have an unsigned byte type in java (am I right? you would be able to skip the masking when converting to an int), but that doesn't help you much does it? 
|
|
|
|
Games published by our own members! Check 'em out!
|
|
erikd
|
 |
«
Reply #3 - Posted
2004-06-23 10:45:09 » |
|
1
| int newpix = (pixel[index] + currentTextureMap[tIndex]) & 0xFF; |
Hows that? I don't think the result will be same, or will they?
|
|
|
|
Herkules
|
 |
«
Reply #4 - Posted
2004-06-23 10:52:41 » |
|
1
| pixel[index] = (byte)(pixel[index] + currentTextureMap[tIndex]); |
This should do it in one step.
|
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #5 - Posted
2004-06-23 10:53:14 » |
|
1
| int newpix = (pixel[index] + currentTextureMap[tIndex]) & 0xFF; |
Hows that? Nope, that gives a different result. For example, if pixel[index] = -1 and currentTextureMap[tIndex] = -1, 1. Then the result in my original code would be 510, which is then truncated by that if statement to 255. 2. Your version would give 254. Sigh, what I would really need is an unsigned byte type. But thanks anyway.
|
|
|
|
|
Herkules
|
 |
«
Reply #6 - Posted
2004-06-23 10:54:05 » |
|
this doesn't do anything bc. pixel already is a byte! Just a waste of cycles.
|
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #7 - Posted
2004-06-23 10:57:14 » |
|
this doesn't do anything bc. pixel already is a byte! Just a waste of cycles.
Actually pixel[index] is a SIGNED byte, so calling pixel[index] & 0xFF converts it to the equivalent of an UNSIGNED byte (actually cast to an int type). However, merely casting pixel[index] to an int type using (int)pixel[index] would not change the sign of the byte.
|
|
|
|
|
Herkules
|
 |
«
Reply #8 - Posted
2004-06-23 11:01:34 » |
|
ic, you're right .... &0xff implicitely casts to int though...
|
|
|
|
erikd
|
 |
«
Reply #9 - Posted
2004-06-23 11:03:00 » |
|
|
|
|
|
Games published by our own members! Check 'em out!
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #10 - Posted
2004-06-23 11:09:42 » |
|
As far as I know, it can't be replaced by a simple bit operation but the if statement is very cheap anyway. Not so cheap when it's executed more than 10000x per frame. I ran a profiler and this section is taking up 70% of my processor time. Even just 1 or 2 less instructions in the code makes a substantial difference.
|
|
|
|
|
crystalsquid
Junior Member  
... Boing ...
|
 |
«
Reply #11 - Posted
2004-06-23 11:14:36 » |
|
You are right that you want to dispose of the branch if possible (conditional branches are nasty for most CPU's), so to get round it you do: 1
| newpix |= ((255-newpix)>>31); |
If newpix < 255, then 255-newpix is positive. the shift right 31 will propogate the sign bit through the whole int, giving '0'. ORing this in gives no change. If newpix >=255, the result of this line is '0xffffffff', and when OR'd in makes newpix 0xffffffff as well, so when you cast back to a byte you will get '0xff' - the clamped value you were after. This takes 3 logical ops (3 cycles), compared to a branch predict error (~30 cycles) whenever the clamping is used. So if you clamp ~10% of pixels this way, the chances are that it will be around the same speed  Your original looks a little odd though - are your arrays really byte arrays? Are you not dealling with multiple colour channels at least? If you are tring to do additive or alpha blending, there are slightly more efficient ways to do this as you can ususally get away with dealling with R and B combined in one int, saving you 1/3rd of the work for a blend. Hope this helps, - Dom
|
|
|
|
|
Mark Thornton
|
 |
«
Reply #12 - Posted
2004-06-23 11:39:30 » |
|
This takes 3 logical ops (3 cycles) Could be a lot more on some CPU that don't have a barrel shifter (Pentium 4?).
|
|
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #13 - Posted
2004-06-23 11:57:52 » |
|
You are right that you want to dispose of the branch if possible (conditional branches are nasty for most CPU's), so to get round it you do: 1
| newpix |= ((255-newpix)>>31); |
If newpix < 255, then 255-newpix is positive. the shift right 31 will propogate the sign bit through the whole int, giving '0'. ORing this in gives no change. If newpix >=255, the result of this line is '0xffffffff', and when OR'd in makes newpix 0xffffffff as well, so when you cast back to a byte you will get '0xff' - the clamped value you were after. Thanks. Unfortunately, it's slower now  . Although that's really quite an elegant way of skirting the conditional statement. I think the issue is not so much a branch prediction error/cache miss, but the sheer number of instructions that gets executed per frame. EDIT: I'm using a p4 so part of the slowdown could probably be with the issue described by Mark in his post above. Your original looks a little odd though - are your arrays really byte arrays? Are you not dealling with multiple colour channels at least? If you are tring to do additive or alpha blending, there are slightly more efficient ways to do this as you can ususally get away with dealling with R and B combined in one int, saving you 1/3rd of the work for a blend.
Hope this helps,
- Dom
I'm actually working with 8-bit IndexColorModels and a DataBuffer.Byte pixel array. The addition that you see is just adding corresponding pixel values from a pre-defined 8-bit texture map to the DataBuffer.Byte pixel array. Hmm. What you suggested got me thinking though. I wonder if I could use a DataBuffer.Int with the IndexColorModel so that 4 8-bit pixel values can be combined in an int, and then perform the adding on the int instead... Thanks 
|
|
|
|
|
Herkules
|
 |
«
Reply #14 - Posted
2004-06-23 12:33:06 » |
|
DataBuffer.MMX would be helpful 
|
|
|
|
phazer
Junior Member  
Come get some
|
 |
«
Reply #15 - Posted
2004-06-23 12:42:58 » |
|
You could also try this: 1 2 3
| pixel[index] = a[newpix];
|
Don't know if there will be a speed increase, but it's worth a shot. The array is so small it will probably fit inside the L1 cache.
|
|
|
|
erikd
|
 |
«
Reply #16 - Posted
2004-06-23 14:41:04 » |
|
What you suggested got me thinking though. I wonder if I could use a DataBuffer.Int with the IndexColorModel so that 4 8-bit pixel values can be combined in an int, and then perform the adding on the int instead... I'm guessing you would have to do an awful lot of masking instead to prevent overflows to 'bleed' into the wrong bits, or am I missing something?
|
|
|
|
tom
|
 |
«
Reply #17 - Posted
2004-06-23 15:09:19 » |
|
Here is some code that adds the rgb components of a int using the same method as crystalsquid. There is some loss in precision, and the 4th component needs to be handled seperatly  1 2 3 4 5 6 7 8 9
| public final static int addSaturated(int a, int b) { a &= 0xfefefe; b &= 0xfefefe; int ab = a+b; int sign = ab & 0x01010100; int sum = (sign-(sign>>8)) | ab; return sum; } |
|
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #18 - Posted
2004-06-23 17:19:21 » |
|
You could also try this: 1 2 3
| pixel[index] = a[newpix];
|
Don't know if there will be a speed increase, but it's worth a shot. The array is so small it will probably fit inside the L1 cache. That gives roughly the same, maybe a little slower speeds than with the "if" statement. Probably because there's the array bounds check with each random access.
|
|
|
|
|
erikd
|
 |
«
Reply #19 - Posted
2004-06-23 17:21:10 » |
|
Overflows are now going to the wrong color and even to the 4th byte (i.e. addSaturated(0xff00ff, 0xff00ff) results in 0x1ff01ff) so the result can be slightly wrong. Maybe this isn't a problem though, but then again maybe it is...
|
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #20 - Posted
2004-06-23 17:33:14 » |
|
I'm guessing you would have to do an awful lot of masking instead to prevent overflows to 'bleed' into the wrong bits, or am I missing something?
Anyway it seems that you can't get the "multiple pixels packed into an int" idea to work with an IndexColorModel, which is unfortunately what I am using.
|
|
|
|
|
Abuse
|
 |
«
Reply #21 - Posted
2004-06-24 14:49:24 » |
|
Silly suggestion, but are you running these tests on the server VM?
|
Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! 
|
|
|
barfy
Junior Member  
The evidence of things not seen
|
 |
«
Reply #22 - Posted
2004-06-24 19:51:48 » |
|
Silly suggestion, but are you running these tests on the server VM? I'm testing the performance on the client VM because as far as I know, there doesn't seem to be a way to run the app with the server VM via webstart... or is there?
|
|
|
|
|
erikd
|
 |
«
Reply #23 - Posted
2004-06-24 22:08:40 » |
|
Even if there is (i don't think there is a non hackish one), you can be 99.99% sure the user is running your game on the client.
|
|
|
|
princec
|
 |
«
Reply #24 - Posted
2004-06-25 09:38:35 » |
|
Unless you, ahh, ship a JRE embedded in the game with the server VM  Cas 
|
|
|
|
Mark Thornton
|
 |
«
Reply #25 - Posted
2004-06-25 10:05:28 » |
|
the app with the server VM via webstart... or is there?
From 1.5 webstart supports arbitrary VM arguments which ought to include -server. No doubt it will only work if the selected JRE has the server VM installed, but perhaps you could have an installable extension which contained the server VM dll and copied it to the right place in the chosen JRE. Obviously the .jar file would have to be signed, but it looks feasible. -server is listed as supported here http://java.sun.com/j2se/1.5.0/docs/guide/javaws/developersguide/syntax.htmlTo assist in adding the server JVM, the ExtensionInstallerService stuff looks ideal. It would be really helpful if Sun would wrap up the server JVM as an extension JNLP and host it somewhere.
|
|
|
|
|
|