Java-Gaming.org Hi !
 Featured games (83) games approved by the League of Dukes Games in Showcase (581) Games in Android Showcase (163) games submitted by our members Games in WIP (632) games currently in development
 News: Read the Java Gaming Resources, or peek at the official Java tutorials
Pages: 1 [2]
 ignore  |  Print
 Finding index of single 1 bit in a long  (Read 5241 times) 0 Members and 1 Guest are viewing this topic.
crystalsquid

Junior Devvie

... Boing ...

 « Reply #30 - Posted 2003-12-17 14:31:43 »

Quote
I think you can do this without any branching.. theoretically it could be faster.

Thats what I said, back before this thread was so succesfully jacked

I had it down to (for just one test block):
Quote
b = a&0x0000ffff;
b = ~((-b)>>31);  // 0 if b was non-zero
r |= 16 & b;

which is 6 logical ops per bit of the output:
AND, NEGATE,SHIFT,INVERT,AND,OR
and you have:

Quote
m2 = ((v & 0xFFFF0000)-1+v);
m3 = (m1-m2)>>31;
i += m3 & 16;

which I make as 7 ops:

Although Im not entirely sure how your version works - I love to see bit tricks I don't understand. Gives me a puzzle to solve

- Dom
swpalmer

JGO Coder

Exp: 12 years

Where's the Kaboom?

 « Reply #31 - Posted 2003-12-17 15:55:07 »

Ah..  sorry i must have skimmed through the posts too fast.. you've beaten me by 1 op.. our last lines are equivalent as I could use OR instead.. however looking at your code gave me an idea to simplify mine..

 1  2 `int m1 = ((x & 0x0000FFFF) - 1) >> 31;i |= 16 & m1;`

I think that should work.  AND, SUB, SHIFT, AND, OR... down to 5 ops.

The AND and the -1 basically create a negative number if none of the bits we are testing were set, and it makes sure that if the high bit was the one that was set that it will not be set anymore. The shift is used to create a mask relying on the high bit being sign extended after the arithmetic shift.  After that it is the simple ORing of the bit in the result..

*edit*  fixed mask - needs to be the same as yours now otherwise the result needs to be subtracted from 31

crystalsquid

Junior Devvie

... Boing ...

 « Reply #32 - Posted 2003-12-17 17:16:23 »

Sweet!
Woz - any chance of a timing test for this version?

- Dom
Woz

Senior Newbie

A Troll who lives in a hole

 « Reply #33 - Posted 2003-12-18 10:17:18 »

Here you go Dom,

It takes anywhere between 1-2 hours to gather the results for each test, so you'll have to forgive the slight delay.

Some slight code modifications on the each bitIndex() were required, as Dom wrote bitIndex6() off the top of his head, and Pamler's initial version was only accepted integers as parameters.

So here are the modifed versions:
 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43 `  public static int bitIndex6(long x)  {    int r = 0;    int a = (int)x;    int b = (int)(x>>32);//    if( a == 0 )//    {//        a = (int)(x >> 32);//        r += 32;//    }    int c = ( a >> 31) | (( -a ) >> 31);  // 0 if a is 0    a = (a & c) | (b & (~c));  // a = b if a was 0    r = 32 & (~c);    //if ((a & 0xFFFF0000) != 0) r += 16;    b = a & 0x0000ffff;    b = ~(( -b ) >> 31);  // 0 if b was non-zero    r |= 16 & b;    //if ((a & 0xFF00FF00) != 0) r += 8;    b = a & 0x00ff00ff;    b = ~(( -b ) >> 31);  // 0 if b was non-zero    r |= 8 & b;    //if ((a & 0xF0F0F0F0) != 0) r += 4;    b = a & 0x0f0f0f0f;    b = ~(( -b ) >> 31);  // 0 if b was non-zero    r |= 4 & b;    //if ((a & 0xCCCCCCCC) != 0) r += 2;    b = a & 0x33333333;    b = ~(( -b ) >> 31);  // 0 if b was non-zero    r |= 2 & b;    //if ((a & 0xAAAAAAAA) != 0) r++;    b = a & 0x55555555;    b = ~(( -b ) >> 31);  // 0 if b was non-zero    r |= 1 & b;    return r;  }`

 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27 `  public static int bitIndex8(long x)  {    int r = 0;    int a = (int)x;    int b = (int)(x>>32);    int c = ( a >> 31) | (( -a ) >> 31);  // 0 if a is 0    a = (a & c) | (b & (~c));  // a = b if a was 0    r = 32 & (~c);    b = ((a & 0x0000FFFF) - 1) >> 31;    r |= 16 & b;    b = ((a & 0x00FF00FF) - 1) >> 31;    r |= 8 & b;    b = ((a & 0x0F0F0F0F) - 1) >> 31;    r |= 4 & b;    b = ((a & 0x33333333) - 1) >> 31;    r |= 2 & b;    b = ((a & 0x55555555) - 1) >> 31;    r |= 1 & b;    return r;  }`

And a most evil looking version, in an attempt to (some what doubtfully) get the method to inline.
 1  2  3  4  5  6  7  8 `  public static int bitIndex9(long x)  {    int a = (int)x;    int c = (a | -a) >> 31;    a = (a & c) | (((int)(x >> 32)) & ~c);    return ((32 & (~c)) | (16 & (((a & 0x0000FFFF) - 1) >> 31)) | (8 & (((a & 0x00FF00FF) - 1) >> 31)) | (4 & (((a & 0x0F0F0F0F) - 1) >> 31)) | (2 & (((a & 0x33333333) - 1) >> 31)) | (1 & (((a & 0x55555555) - 1) >> 31)) );  }`

Results in next post.

------
Woz.
Woz

Senior Newbie

A Troll who lives in a hole

 « Reply #34 - Posted 2003-12-18 10:56:48 »

bitIndex6()

 456.719457ibm13index6e.csvThread priority = 5Classic VMIBM Corporation1.3.01.3.0IBM Corporation 486.462486ms114index6e.csvThread priority = 5nullnullnull1.1.4Microsoft Corp. 361.481361sun12index6e.csvThread priority = 5Classic VMSun Microsystems Inc.1.21.2Sun Microsystems Inc. 591.169591sun131index6e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.3.1-b241.3.1Sun Microsystems Inc. 728.77729sun141index6e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.4.1-b211.4.1Sun Microsystems Inc. 606.939607sun142index6e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.4.2_01-b061.4.2_01Sun Microsystems Inc.

bitIndex8()

 418.503418ibm13index8e.csvThread priority = 5Classic VMIBM Corporation1.3.01.3.0IBM Corporation 455.988456ms114index8e.csvThread priority = 5nullnullnull1.1.4Microsoft Corp. 337.237337sun12index8e.csvThread priority = 5Classic VMSun Microsystems Inc.1.21.2Sun Microsystems Inc. 567.88568sun131index8e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.3.1-b241.3.1Sun Microsystems Inc. 627.207627sun141index8e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.4.1-b211.4.1Sun Microsystems Inc. 574.385574sun142index8e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.4.2_01-b061.4.2_01Sun Microsystems Inc.

Evil looking bitIndex9()

 22.03322ibm13index9e.csvThread priority = 5Classic VMIBM Corporation1.3.01.3.0IBM Corporation 463.16463ms114index9e.csvThread priority = 5nullnullnull1.1.4Microsoft Corp. 322.418322sun12index9e.csvThread priority = 5Classic VMSun Microsystems Inc.1.21.2Sun Microsystems Inc. 523.457523sun131index9e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.3.1-b241.3.1Sun Microsystems Inc. 523.457523sun131index9e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.3.1-b241.3.1Sun Microsystems Inc. 513.895514sun142index9e.csvThread priority = 5Java HotSpot(TM) Client VMSun Microsystems Inc.1.4.2_01-b061.4.2_01Sun Microsystems Inc.

It would appear that IBM's JVM optimized bitIndex9() out!

As a whole they appear to be slower (than bitIndex5() ) under microbench mark conditions. Personnally I was expecting that the JIT compilers would be considerably slower due to most JIT's being nothing more than a Interpreter without the overheads of interpreting (!). Interpreters dont reorder code they execute each byte code as it comes.

All but one JVM is able to beat the original bitIndex2().

Note that the 1.4.2 -server -Xcompile scores are not present, as I have not finished playing with it yet.
I have tested it on bitTest2, 4 & 5.
Using -Xcompile made no difference to the client JVM as neither did changing the amount of "warm up" time.

The server version when using just the -server switch manages to optimize out the the benchmark completely giving a score of zero. When combined with -Xcompile a score is presented, it has the greatest gain (percentage increase over bitIndex2() ) in the order of  190% for bitIndex4(), which is faster that bitIndex5(). But is using 308 clocks to do it.

Obviously the microbenchmark is going to have to be modified in order to stop the code being optimized out.

It would be interesting to try them on a P4 which has a very long pipeline, so surely bitIndex9() would be faster than bitIndex5().

Cold fish all round then!
-------
Woz.
swpalmer

JGO Coder

Exp: 12 years

Where's the Kaboom?

 « Reply #35 - Posted 2003-12-18 11:27:35 »

My code is a bit tighter than your bitIndex9...

 1  2  3  4  5  6  7  8 `      public static int getIndex(int v)      {            return ((((v & 0x55555555)-1)>>31) & 1)                 | ((((v & 0x33333333)-1)>>31) & 2)                 | ((((v & 0x0F0F0F0F)-1)>>31) & 4)                 | ((((v & 0x00FF00FF)-1)>>31) & 8)                 | ((((v & 0x0000FFFF)-1)>>31) & 16);      }`

Which you can trivially extend to work on a long intead of an int.

** EDIT ** Actually... hmmm.. nevermind...I see what you are doing...

Note: if you run the 'long' version on a G5 Mac it will use 64bit instructions and registers.  Without 64 bit native operations your 'a' and 'c' calculations might make a difference.  Though I think that can be written a little bit more efficiently too.
Once you know to OR in 32 you can do something like this and follow through with the 'int' version.
 1  2  3  4  5  6  7 `int bitIndex(long v){...int i = (int)(v | (v >>> 32));...do int version on i}`

princec

« JGO Spiffy Duke »

Medals: 506
Projects: 3
Exp: 16 years

Eh? Who? What? ... Me?

 « Reply #36 - Posted 2003-12-18 12:30:05 »

Quick tip for microbenchmarkers:
Just after you calculate your total time, System.out.println() the sum of the results of calling the function (for example) which prevents the methods being optimized away.

Cas

crystalsquid

Junior Devvie

... Boing ...

 « Reply #37 - Posted 2003-12-18 16:35:07 »

Woz: Cheers for the timings (& the fixes)
Shame they're not any faster  :-/ Interesting though. I wonder if its not managing to interleave the dependancy chains for each output bit.

- Dom
swpalmer

JGO Coder

Exp: 12 years

Where's the Kaboom?

 « Reply #38 - Posted 2003-12-18 22:23:08 »

What does the test harness look like?
You would think that the branch prediction of modern processors would be totally defeated by this stuff since each branch has exactly a 50:50 chance of being taken.
I would like to try this on the Mac to see if the PowerPC handles things differently.

AndersDahlberg

Junior Devvie

 « Reply #39 - Posted 2003-12-19 12:27:22 »

Would be interesting to see results from a Crusoe processor running a similar test
Pages: 1 [2]
 ignore  |  Print

You cannot reply to this message, because it is very, very old.

 theagentd (17 views) 2015-05-27 22:21:17 theagentd (22 views) 2015-05-27 22:21:08 CopyableCougar4 (17 views) 2015-05-27 19:24:50 MrMapcom (24 views) 2015-05-23 20:26:16 MrMapcom (32 views) 2015-05-23 20:23:34 Waterwolf (37 views) 2015-05-20 15:01:45 chrislo27 (44 views) 2015-05-20 03:42:21 BurntPizza (79 views) 2015-05-10 15:53:18 FrozenShade (63 views) 2015-05-07 09:11:21 TheLopais (227 views) 2015-05-06 13:36:48
 Spasi 28x Rayvolution 23x Riven 16x theagentd 15x Drenius 15x BurntPizza 15x ra4king 13x opiop65 12x EgonOlsen 11x princec 11x DavidBVal 11x Husk 11x KevinWorkman 10x scanevaro 8x orangepascal 8x kevglass 8x
 List of Learning Resources2015-05-05 10:20:32How to: JGO Wikiby Mac702015-02-17 20:56:162D Dynamic Lighting2015-01-01 20:25:42How do I start Java Game Development?by gouessej2014-12-27 19:41:21Resources for WIP gamesby kpars2014-12-18 10:26:14Understanding relations between setOrigin, setScale and setPosition in libGdx2014-10-09 22:35:00Definite guide to supporting multiple device resolutions on Android (2014)2014-10-02 22:36:02List of Learning Resources2014-08-16 10:40:00
 java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org