Java-Gaming.org Hi !
Featured games (90)
games approved by the League of Dukes
Games in Showcase (767)
Games in Android Showcase (229)
games submitted by our members
Games in WIP (854)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Fastest binary search on 4GB worth of long[]  (Read 27559 times)
0 Members and 1 Guest are viewing this topic.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Posted 2016-04-07 18:49:06 »

I had an hour to spare, and decided to write and optimize something that I did not need.
I decided to try to make Arrays.binarySearch(...) less cache-trashy.

I also changed the specification of my binarySearch algorithm a bit, to make it behave more like a traversable database-index, by demanding it must return the lowest index of a specific value in the array, in case the value occurs multiple times.

My initial (hierarchial) approach shows (on average) performance 3.1x as fast as a naive implementation:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
Creating data-set...
Generating random values...
Sorting data-set...
Generating lookup indices...
Generating lookup tables...
Printing table lengths...
67108864
2097152
65536
2048
64
Performing sanity-checks...
Starting benchmark...

...

norm took 955ms (100%, 1097K ops/s)
rec1 took 610ms (156%, 1718K ops/s)
rec2 took 523ms (182%, 2004K ops/s)
rec3 took 505ms (189%, 2076K ops/s)
rec4 took 388ms (246%, 2702K ops/s)

...


4GB: (die hard mode)
1  
2  
3  
4  
5  
6  
7  
8  
9  
...

norm took 12564ms (100%, 667K ops/s)
rec1 took 8137ms (154%, 1030K ops/s)
rec2 took 5306ms (236%, 1580K ops/s)
rec3 took 4962ms (253%, 1690K ops/s)
rec4 took 4137ms (303%, 2027K ops/s)

...


Die-hards are supposed to set
isDiehard
to
true
.
Who feels like wasting some time squeezing even more single threaded performance out of this puppy?
Low hanging fruits are frowned upon, but graciously incorporated!

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
153  
154  
155  
156  
157  
158  
159  
160  
161  
162  
163  
164  
165  
166  
167  
168  
169  
170  
171  
172  
173  
174  
175  
176  
177  
178  
179  
180  
181  
182  
183  
184  
185  
186  
187  
188  
189  
190  
191  
192  
193  
194  
195  
196  
import java.util.Arrays;
import java.util.Random;

public class QuickBinarySearch {
   public static void main(String[] args) {
      Random rndm = new Random(1235L);

      boolean isDiehard = false; // 4 GB (512 MB for whips)

      System.out.println("Creating data-set...");
      long[] somedata = new long[(isDiehard ? 512 : 64) * 1024 * 1024];

      System.out.println("Generating random values...");
      for (int i = 0; i < somedata.length; i++) {
         somedata[i] = Math.abs(rndm.nextLong()) % somedata.length * 8;
      }

      System.out.println("Sorting data-set...");
      Arrays.sort(somedata);

      System.out.println("Generating lookup indices...");
      long[] lookups = new long[somedata.length / 64];
      for (int i = 0; i < lookups.length; i++) {
         lookups[i] = somedata[rndm.nextInt(somedata.length)] + rndm.nextInt(10) - 5;
      }

      System.out.println("Generating lookup tables...");

      final int shift = 5;
      long[] reduced1 = new long[somedata.length >> (shift * 1)];
      long[] reduced2 = new long[somedata.length >> (shift * 2)];
      long[] reduced3 = new long[somedata.length >> (shift * 3)];
      long[] reduced4 = new long[somedata.length >> (shift * 4)];

      reduce(somedata, reduced1, shift);
      reduce(reduced1, reduced2, shift);
      reduce(reduced2, reduced3, shift);
      reduce(reduced3, reduced4, shift);

      System.out.println("Printing table lengths...");
      System.out.println(somedata.length);
      System.out.println(reduced1.length);
      System.out.println(reduced2.length);
      System.out.println(reduced3.length);
      System.out.println(reduced4.length);

      System.out.println("Performing sanity-checks...");
      {
         for (int i = -1_000_000; i <= 1_000_000; i++) {
            int i1 = binarySearchLo(somedata, i);
            int i2 = binarySearchLo(somedata, reduced1, shift, i);
            int i3 = binarySearchLo(somedata, reduced1, reduced2, shift, i);
            int i4 = binarySearchLo(somedata, reduced1, reduced2, reduced3, shift, i);
            int i5 = binarySearchLo(somedata, reduced1, reduced2, reduced3, reduced4, shift, i);
            if (i1 != i2)
               throw new IllegalStateException();
            if (i1 != i3)
               throw new IllegalStateException();
            if (i1 != i4)
               throw new IllegalStateException();
            if (i1 != i5)
               throw new IllegalStateException();
         }
      }

      int dummy = 0;
      System.out.println("Starting benchmark...");
      for (int k = 0; k < 8; k++) {
         System.out.println();
         long normTook;
         {
            long t0 = System.nanoTime();
            for (long lookup : lookups) {
               dummy += binarySearchLo(somedata, lookup);
            }
            long t1 = System.nanoTime();
            long took = (t1 - t0) / 1_000_000L;
            System.out.println("norm took " + took + "ms (100%, " + (lookups.length / took) + "K ops/s)");
            normTook = took;
         }
         {
            long t0 = System.nanoTime();
            for (long lookup : lookups) {
               dummy += binarySearchLo(somedata, reduced1, shift, lookup);
            }
            long t1 = System.nanoTime();
            long took = (t1 - t0) / 1_000_000L;
            System.out.println("rec1 took " + took + "ms (" + (int) (normTook * 100.0f / took) + "%, "
                  + (lookups.length / took) + "K ops/s)");
         }
         {
            long t0 = System.nanoTime();
            for (long lookup : lookups) {
               dummy += binarySearchLo(somedata, reduced1, reduced2, shift, lookup);
            }
            long t1 = System.nanoTime();
            long took = (t1 - t0) / 1_000_000L;
            System.out.println("rec2 took " + took + "ms (" + (int) (normTook * 100.0f / took) + "%, "
                  + (lookups.length / took) + "K ops/s)");
         }
         {
            long t0 = System.nanoTime();
            for (long lookup : lookups) {
               dummy += binarySearchLo(somedata, reduced1, reduced2, reduced3, shift, lookup);
            }
            long t1 = System.nanoTime();
            long took = (t1 - t0) / 1_000_000L;
            System.out.println("rec3 took " + took + "ms (" + (int) (normTook * 100.0f / took) + "%, "
                  + (lookups.length / took) + "K ops/s)");
         }
         {
            long t0 = System.nanoTime();
            for (long lookup : lookups) {
               dummy += binarySearchLo(somedata, reduced1, reduced2, reduced3, reduced4, shift, lookup);
            }
            long t1 = System.nanoTime();
            long took = (t1 - t0) / 1_000_000L;
            System.out.println("rec4 took " + took + "ms (" + (int) (normTook * 100.0f / took) + "%, "
                  + (lookups.length / took) + "K ops/s)");
         }
      }
      System.out.println("dummy:" + dummy);
   }

   public static int binarySearchLo(long[] data1, long[] data2, long[] data3, long[] data4, long[] data5, int shift,
         long lookup) {
      int idx = binarySearchLo(data2, data3, data4, data5, shift, lookup);
      return binarySearchLoNextLevel(data1, idx, shift, lookup);
   }

   public static int binarySearchLo(long[] data1, long[] data2, long[] data3, long[] data4, int shift, long lookup) {
      int idx = binarySearchLo(data2, data3, data4, shift, lookup);
      return binarySearchLoNextLevel(data1, idx, shift, lookup);
   }

   public static int binarySearchLo(long[] data1, long[] data2, long[] data3, int shift, long lookup) {
      int idx = binarySearchLo(data2, data3, shift, lookup);
      return binarySearchLoNextLevel(data1, idx, shift, lookup);
   }

   public static int binarySearchLo(long[] data1, long[] data2, int shift, long lookup) {
      int idx = binarySearchLo(data2, lookup);
      return binarySearchLoNextLevel(data1, idx, shift, lookup);
   }

   public static long[] reduce(long[] data, int shift) {
      long[] lookup = new long[data.length >> shift];
      reduce(data, lookup, shift);
      return lookup;
   }

   public static void reduce(long[] src, long[] dst, int shift) {
      if (src.length != (dst.length << shift)) {
         throw new IllegalStateException();
      }
      for (int i = 0; i < dst.length; i++) {
         dst[i] = src[i << shift];
      }
   }

   private static int binarySearchLoNextLevel(long[] data, int idx, int shift, long lookup) {
      idx = (idx >> 31) ^ idx;

      // int min = Math.max(0, (idx - 1) << shift);
      int min = (idx - 1) << shift;
      min &= ~(min >> 31);

      int max = Math.min(data.length, (idx << shift) + 1);

      return binarySearchLo(data, min, max, lookup);
   }

   private static int binarySearchLo(long[] a, long key) {
      return binarySearchLo(a, 0, a.length, key);
   }

   private static int binarySearchLo(long[] a, int fromIndex, int toIndex, long key) {
      int low = fromIndex;
      int high = toIndex - 1;

      while (low <= high) {
         int idx = (low + high) >>> 1;
         long val = a[idx];

         if (val < key)
            low = idx + 1;
         else if (val > key)
            high = idx - 1;
         else if (idx != fromIndex && val == a[idx - 1])
            high = idx - 1; // find lower index
         else
            return idx; // lowest index with value
      }
      return -(low + 1);
   }
}

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline jonjava
« Reply #1 - Posted 2016-04-07 19:50:04 »

<a href="http://www.youtube.com/v/Ccoj5lhLmSQ?version=3&amp;hl=en_US&amp;start=" target="_blank">http://www.youtube.com/v/Ccoj5lhLmSQ?version=3&amp;hl=en_US&amp;start=</a>

Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #2 - Posted 2016-04-07 19:53:43 »

Ah, it's not that complex Smiley

Let's say you do a binary search, and find your cache to be trashed.
Now shrink the ordered data-set, by taking every n-th value.
Perform the binary search on that smaller data-set.
You now have an index at which (after scaling up) you can expect the value to be in the larger data-set.
Now do a binary search on the larger data-set within a very small range.

Rinse and repeat, until the smallest data-sets fits into L1-cache, and the second in L2-cache.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #3 - Posted 2016-04-08 08:27:31 »

I don't want no stinkin' medals, I want competition! Kiss

Nobody's up for a challenge? Nobody? Surely your fingers must be itching.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1053
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #4 - Posted 2016-04-08 08:53:17 »

I do this sort of thing all day long at work, and they pay me for it!

But as you've got a bit of spare time, how about you give us the raw speed of the naive search algorithm (that is, scan 4gb of sorted longs linearly in memory).

Cas Smiley

Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #5 - Posted 2016-04-08 09:08:36 »

I do this sort of thing all day long at work, and they pay me for it!
What I wouldn't give for such a job! I'm integrating stuff into other stuff, and mapping stuff to stuff.
Meetings, presentations, politics, getting everybody to agree. Pays big bucks, drains the soul.


But as you've got a bit of spare time, how about you give us the raw speed of the naive search algorithm (that is, scan 4gb of sorted longs linearly in memory).

1  
2  
3  
4  
5  
norm took 12564ms (100%, 667K ops/s) // <--- naive binarySearch, finding *lowest* index of value
rec1 took 8137ms (154%, 1030K ops/s)
rec2 took 5306ms (236%, 1580K ops/s)
rec3 took 4962ms (253%, 1690K ops/s)
rec4 took 4137ms (303%, 2027K ops/s) // <--- 4 layers of indirection

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1053
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #6 - Posted 2016-04-08 09:13:55 »

That's still a binary search time in norm isn't it? I mean literally this:
1  
for (int i = 0; i < in.length; i ++) if (in[i] == findThis) return i;


Cas Smiley

Offline princec

« JGO Spiffy Duke »


Medals: 1053
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #7 - Posted 2016-04-08 09:20:25 »

Bah, only got 5GB physical RAM here at work. Can't figure out a combination of commandline voodoo that will allow me to allocate 4GB.

Cas Smiley

Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #8 - Posted 2016-04-08 09:21:22 »

That's still a binary search time in norm isn't it? I mean literally this:
1  
for (int i = 0; i < in.length; i ++) if (in[i] == findThis) return i;
Arg... I should learn to read. But then again, it's a mad suggestion! Pointing

Are you suspecting that the reduced cache-trashing of a linear search may be in the same ball-park as a cache-trashy binary search? Madness, I tell you! Tongue

I just added it, and it struggles with the 'sanity-check' already - not with correctness, but with performance. I'll report back when my poor CPU finished the benchmark on the 512MB data-set, 4GB may take a few days.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1053
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #9 - Posted 2016-04-08 09:35:47 »

Yes, sheer madness Smiley I wonder at what size of data-set it becomes realistically sensible to do brute-force.

Cas Smiley

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #10 - Posted 2016-04-08 09:37:30 »

Data-set
Linear ops/s
Naive binary search ops/s
Hierarchical binary search ops/s
512 MB
50 ops/sec
1,097,000 ops/sec
2,702,000 ops/sec
4096 MB
6 ops/sec
667,000 ops/sec
2,027,000 ops/sec

I killed the process... would have taken eons.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1053
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #11 - Posted 2016-04-08 09:39:23 »

Science!  Shocked

Cas Smiley

Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2016-04-08 10:05:59 »

The break even point for linear vs binary-search is:

*drum roll*

1  
2  
3  
4  
long[12]

norm took 3.757029ms (100%, 49906K op/s)
linr took 3.698955ms (101%, 50689K op/s)


1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
   private static int linearSearchLo(long[] data, long key) {
      for (int i = 0; i < data.length; i++) {
         if (data[i] == key) {
            return i;
         }
         if (data[i] > key) {
            return -(i + 1);
         }
      }
      return -(data.length + 1);
   }

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline blobloblob

Senior Newbie


Medals: 1
Exp: 3 years



« Reply #13 - Posted 2016-04-08 10:07:17 »

I would hash it at sorting and I would get O(1) performance XD
Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #14 - Posted 2016-04-08 10:09:13 »

Big-O notation is about performance degradation over a growing data-set. O(1) can be dreadful.

Hashing is not so usable for 1:N indexes. You'd need a fairly large data-structure behind every hash-key, and how do you optimize that?

Also, you can't traverse (in-order) a hashed data-structure.


Anyway, I'll give it a shot, for direct lookups (when time permits).

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline blobloblob

Senior Newbie


Medals: 1
Exp: 3 years



« Reply #15 - Posted 2016-04-08 10:20:18 »

That wasn't actually for serious ;P but maybe there is some way in crazyness. Wink
Offline Roquen

JGO Kernel


Medals: 517



« Reply #16 - Posted 2016-04-08 10:22:45 »

Depending on what the "important" operations are, there's also bloom-filters.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1356
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #17 - Posted 2016-04-08 10:49:30 »

Depending on what the "important" operations are, there's also bloom-filters.
Interesting concept.

There are no important operations though, as I don't have a use-case yet where a factor 3 performance increase in memory access is actually meaningful. Whatever accesses this read-only 'index' is going to be the bottleneck.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 485
Exp: 7 years



« Reply #18 - Posted 2016-04-08 13:05:08 »

I'd be curious as to where the break even point between hierarchical indices and something like binary search on level-ordered storage. Might not need as many levels in the hierarchy.
Offline Cero
« Reply #19 - Posted 2016-04-08 14:29:36 »

Surely your fingers must be itching.
You know, believe or not, no. Absolutely not :D
You got this!

Offline delt0r

JGO Wizard


Medals: 144
Exp: 18 years


Computers can do that?


« Reply #20 - Posted 2016-04-08 21:21:39 »

Have to admit that i can't see where i would be using 4GB in this side of ram that is ordered so. I can see it with a mapped file or some old fashion spinning disks backing things however.

But i do find such microbenchmarks interesting. It keeps underlying one thing. Cache coherency is almost everything if aiming for performance.

I have no special talents. I am only passionately curious.--Albert Einstein
Pages: [1]
  ignore  |  Print  
 
 

 
EgonOlsen (1232 views)
2018-06-10 19:43:48

EgonOlsen (1101 views)
2018-06-10 19:43:44

EgonOlsen (842 views)
2018-06-10 19:43:20

DesertCoockie (1251 views)
2018-05-13 18:23:11

nelsongames (1100 views)
2018-04-24 18:15:36

nelsongames (1329 views)
2018-04-24 18:14:32

ivj94 (2068 views)
2018-03-24 14:47:39

ivj94 (1222 views)
2018-03-24 14:46:31

ivj94 (2147 views)
2018-03-24 14:43:53

Solater (787 views)
2018-03-17 05:04:08
Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08

Deployment and Packaging
by gouessej
2018-08-22 08:03:45

Deployment and Packaging
by philfrei
2018-08-20 02:33:38

Deployment and Packaging
by philfrei
2018-08-20 02:29:55

Deployment and Packaging
by philfrei
2018-08-19 23:56:20

Deployment and Packaging
by philfrei
2018-08-19 23:54:46
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!