Java-Gaming.org    
Featured games (81)
games approved by the League of Dukes
Games in Showcase (499)
Games in Android Showcase (118)
games submitted by our members
Games in WIP (568)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Cuckoo hashing code[important update**2]  (Read 11160 times)
0 Members and 1 Guest are viewing this topic.
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Posted 2010-10-10 10:01:48 »

Cuckoo hashing is not so new method of implementing hash tables. It has some very nice properties providing the load factor is not too high. Since objects are by reference, the load factor of the array is not a memory issue directly. As such its well suited to java.

In particular is has worst case O(1) performance as opposed to O(n) for chaining and other hash table implementations. Anyway i have been asked to make my updated versions available (I have already made previous versions available). This has been used in production code for some time now. So it should be pretty stable.

Its also always faster than a HashMap for all my test cases.

However comments are for wimps and communists  Wink

Warning
I do break the collections contract also. All methods that return a collection, are not backed by the original collection. That is they are copies.

This is no longer true. They now conform completely to the Map contract.

the attachment has all the source for the package....and perhaps a little extra just to make sure that I didn't miss anything. This is used in this project: http://www.mabs.at/ewing/msms/index.shtml

Note that for these classes you can choose between BSD, Apache or LGPL with classpath exception for a license.

Comments and suggestions are always welcome. Note i lurk in #lwjgl on IRC a lot too.

Update
Ok so I found out that some of the hashing code was pretty brain dead. This is now fixed. Also added is a junit test for almost everything in cuckoo.*. Not that i found any bugs other than poor performance from bad hashing.

Update2
For some reason the attachment is not working. I a, re-attaching. If that does not work, try this:
www.mabs.at/ewing/cuckoo.jar

Update3
Well i fixed the brain dead hashing with brain dead bugs. BIG bugs that totally ruined all performance. Its now fixed. Sorry. Attachment now also fixed properly.

Just synced the comments with the code.  persecutioncomplex

I have no special talents. I am only passionately curious.--Albert Einstein
Offline gouessej
« Reply #1 - Posted 2010-10-10 10:25:33 »

Hi!

However comments are for wimps and communists  Wink
Ok therefore I can comment.

Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #2 - Posted 2010-10-10 13:36:22 »

Just to be clear -- i meant that the code is not well commented. Probably good enough however.

I have no special talents. I am only passionately curious.--Albert Einstein
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Nate

JGO Kernel


Medals: 149
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #3 - Posted 2010-10-11 01:02:49 »

Micro benchmarks updated:
http://www.java-gaming.org/topics/map-performance/21395/msg/191504/view.html#msg191504
Are the benchmarks reasonable? The source is there, can just import the project into Eclipse and run.

Offline CommanderKeith
« Reply #4 - Posted 2010-10-11 01:45:38 »

I use delt0r's cuckoo hashing code in my A* path-finding code and found that it is the fastest map implementation AND produces no garbage unlike the other map implementations which make an HashMap.Entry for every object they store.

Thanks for the code  Smiley

Offline Nate

JGO Kernel


Medals: 149
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #5 - Posted 2010-10-11 05:43:03 »

CommanderKeith, good point on the garbage creation. How did you measure the maps? I guess you just measured your A* implementation? Any ideas why the benchmarks don't mimic real usage?

Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #6 - Posted 2010-10-11 06:47:01 »

No benchmarks are reasonable, but some are interesting.  Roll Eyes Seriously i had problems with my initial tests because of quirky benchmarks.

I am not really surprised with its rank in the benchmarks however.  First most of my objects cache 3 fields for the objects stored in the map, this makes it quite a bit faster (the CuckooHashFields interface). And two: usage patterns really really matter. For example if you end up having to rehash a lot then i would expect it to degrade to the default hash map implementation. I would also expect a good performance gain by having a oversize initial capacity.

In my applications you get a lot of add/put and remove but mostly contains. The size stays approximately the same throught the maps life time in my app.

As awlays.. YMMV.

I should also point out that i use java.util.* implementations until i see a performance problem.

Edit: PS my hashing is not very optimized. There is perhaps a good performance gain there too.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Roquen
« Reply #7 - Posted 2010-10-11 09:06:10 »

A very interesting variant from the original version is to add 'b' bins.  So for each hash index there are 'b' linear slots associated with it.

(edit: since I'm sure that was as clear as mud, a survey-like paper is here: http://www.ru.is/faculty/ulfar/CuckooHash.pdf
Offline jezek2
« Reply #8 - Posted 2010-10-11 10:25:43 »

I've read about cuckoo before and it's indeed a nice algorithm. How well (or bad) it handles a degenerative case of all keys having the same hash code (as returned from the hashCode() method)? Does it have maximum of "number_of_hashes * bin_size" entries in such case? Expanding the array size and rehashing would end up with the same maximum amount of possible entries, right? Or have I overlooked something? Smiley
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #9 - Posted 2010-10-11 10:41:47 »

cuckoo hashing will fail with many identical hash codes as i have implemented it. However this is a pretty easy situation to avoid generally*. It can also be avoided internally. But is not a usage case i need to consider.

For the security aware this does have implications when you don't control the type of objects inserted into the map. Someone could mount a DNS attack by sending objects that have the same hash code.

I have no special talents. I am only passionately curious.--Albert Einstein
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline CommanderKeith
« Reply #10 - Posted 2010-10-11 10:45:16 »

CommanderKeith, good point on the garbage creation. How did you measure the maps? I guess you just measured your A* implementation? Any ideas why the benchmarks don't mimic real usage?

One thing I just remembered, when I changed these lines:

1  
int hash =System.identityHashCode(e);


to:

1  
int hash =e.hashCode();


It went much faster. I only discovered this after profiling. Delt0r's map was way faster than the java.util maps in my case.

Offline Nate

JGO Kernel


Medals: 149
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #11 - Posted 2010-10-11 11:36:39 »

Re-did the charts with server VM and way more iterations. Wish we had a good way of generically comparing hashmaps in a few different ways, but I guess people will just have to experiment in their actual apps.

Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #12 - Posted 2010-10-11 13:33:55 »

Quote
..but I guess people will just have to experiment in their actual apps.
It really is the only way IMO.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #13 - Posted 2010-10-12 19:27:18 »

I have updating the release. See the first post. This time i include the class files as well and a junit test. Please update if you downloaded the old version.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline CommanderKeith
« Reply #14 - Posted 2010-10-13 01:39:50 »

Thanks for the update delt0r. But the jar seems like it's missing the actual cuchoo map code.

Cheers,
keith

Offline Roquen
« Reply #15 - Posted 2010-10-13 08:37:44 »

WRT: hashCode values from wrapped primitives, I perform a decorrelation step directly after calling hashCode.  Mine is:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
   // odd scaled approximation of: (3-Sqrt[5])/2
  private static final int weyl = 0x61c88647;

    ...

    int hash  = key.hashCode();
    int w     = 0;
    int h0;

    hash ^= hash<<10; hash ^= hash>>>15;
    hash ^= hash<<4;  hash ^= hash>>>13;
    w     += weyl;
    h0    = (hash + w) & mask;  // actual hash-0 used
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #16 - Posted 2010-10-13 09:00:28 »

Sorry. Now the jar is correct.

I did try similar things Roquen, but found that a simple hash*PRIME+PRIME worked as good as any and was a little faster. The randomness properties where also just as good.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline CommanderKeith
« Reply #17 - Posted 2010-10-13 09:56:36 »

Thanks.

Btw another thing that made the code a bit quicker was to use the == operator instead of equals.

Offline Riven
« League of Dukes »

JGO Overlord


Medals: 803
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #18 - Posted 2010-10-13 10:00:21 »

Thanks.

Btw another thing that made the code a bit quicker was to use the == operator instead of equals.

for primitives you can use ==, but in a Map, you really must use equals or you wouldn't be able to get a value if you didn't have that exact reference lying around, which you rarely have.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline lhkbob

JGO Knight


Medals: 32



« Reply #19 - Posted 2010-10-13 14:50:08 »

for primitives you can use ==, but in a Map, you really must use equals or you wouldn't be able to get a value if you didn't have that exact reference lying around, which you rarely have.

Unless you want an identity map, which I find very useful.

Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #20 - Posted 2010-10-13 14:52:26 »

Included in the jar is a set and map implementations both with identity versions as well.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #21 - Posted 2010-12-28 20:43:37 »

Bumping so that folks realize that there has been an important bug fix. In particular performance is now restored to what it should be. Also i have added a int keyed map as well. Its quite a bit faster than using Integers.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #22 - Posted 2010-12-29 10:01:17 »

Perhaps the last big fix? I have now changed the implementation to return proper backed versions of collections. This results in much faster iteration, and has conformance to the Map interface contract. Many of the Collections.* methods use these methods and this results in expected behavior and better performance. 

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Nate

JGO Kernel


Medals: 149
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #23 - Posted 2010-12-30 00:45:29 »

I'm messing around with cuckoo hashing. I think I might have found an issue with your implementation. Does the following correctly describe what happens with your implementation?

Quote
put key1 in a, b, or c
     key1 goes in a
put key2 in a, x, or z
     key2 goes in x
remove key1
put key2 in a, x, or z
     key2 goes in a
key2 is in the table twice!

Edit:

It seems you increment size in some places without checking >= threshold. Is this ok?

I think the capacity specified in the constructor should be used as "initialCapacity * (1 / loadFactor)". This avoids rehashing when the user knows how many items they want to insert (especially for small capacities).

Offline CommanderKeith
« Reply #24 - Posted 2010-12-30 05:18:40 »


Hi Greg,
Thanks for the updates. I tried the new CIdentityHashSet you posted in the new jar file attached to the first post.

I found it's a bit slow because it uses System.identityHashCode(e) rather than e.hashCode().

After fixing that, I found that it still ran slower than the old version and I looked into the difference and found that you didn't do any maths on the hashCode in the first version:

1  
2  
3  
int hash = e.hashCode();
int hash2 = 1 + ((hash * PRIME) + (hash >>> 16));
int hash3 = 2 + (hash ^ (hash >>> 16)) * PRIME;


But in the new version:

1  
2  
3  
4  
int code=e.hashCode();
int hash =hash(code);
int hash2 = hash2(code);
int hash3 =hash3(code);


So I replaced the 'int hash =hash(code);' with 'int hash = e.hashCode();' and it was back to normal again. Are these acceptable performance adjustments?

Cheers  Smiley
keith

Offline Roquen
« Reply #25 - Posted 2010-12-30 07:43:18 »

I've had better luck using multiple bins instead of single bin implementations.  The "sweet" point seeming to be either 2 or 4 bins per masked hash. My goodness metric was density and not performance. YMMV.

There are advantages and disadvatages of using identityHashCode.  Advantage: It is (at least used to be) the raw address (maybe salted?) of the object. So you'll get tended to get objects that are allocated together nearby in the hash table.  This may or may not match a given use-case.  Disadvantage: It has to go across the native boundary and perform memory management fix-up, which can lead to requiring more memory. This will be the case for objects that just fit within the allocators bucket and the extra storage (orginal memory address) pushes the size into a larger chunk.  Additonally, is doesn't have the notion of equivalent objects returning the same hash value.  Finally, as noted, it tends to require more cycles.

Rehashing the inital hash returned by the object.  This is another use-case dependent decision.

identityHashCode: rehashing the intial in effect negates any reason to use this call. So the first used hash should be something like:  hash0 = (hash >>> 4)|(hash<< 28), if used.

hashCode: Notiably the integer wrapper types return their represented value as the hash.  In the general case this is of little use.  However it seems rather common than sequences are used.  In this case not rehashing the intial is reasonable.  The map implementations shipped with Java do bit mix the results of hashCode, this is more important there than for cuckoo since only a single hash value is used and must be 'good' for the general case.  The additonal hash code(s) cover this case for cuckoo.
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #26 - Posted 2010-12-30 09:42:32 »

CommanderKeith you *must* use System.ideneitiyHashCode() because objects can and should override hashCode() to be consistent with .equals(). ie Integer. Also (new Integer(1)).hashCode()==1 and creates pathological cases very easily. So you really must hash the returned Object.hashCode().

Yes its a little slower. But it should be correct in all cases.  Getting to the wrong place faster is not really faster.

Nate: I will double check. This should be caught in the unit tests and I did have it correct at one point. Perhaps doing the "lazy hashing" borked it. Shows the importance of code review Smiley

There are lots of different ways to implement this. I found that the cost of hashing is small compared to cache issues and Object.equals() and Object.hashCode() for the general case. So i went with 3 hash codes so load factors can go higher than 0.9 . The default is 0.75 and setting it to .9 will mean that puts that result in a rehash will be slower, but once re sized iterator performance will be faster. In my tests on 64bit and 32bit JVM on linux, having a single table was a little faster, I decided due to cache issues. But "fast" on JVM is a moving target with its own optimizations.

In practice I have custom CHashMap with custom Hash Fields. This is much faster than any of the above. But fitting the right Map to the job has always been part of optimization.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline delt0r

JGO Knight


Medals: 27
Exp: 18 years


Computers can do that?


« Reply #27 - Posted 2010-12-30 10:09:14 »

Nate was right. Added something to the unit tests and fixed the code. This is getting silly. A good reason to use java.util.* unless you really need something faster.

This also illustrates why units tests are not the panacea of good code. They only test for what you think of.

I should note that for my IntHashMap the first hash is highly biased so that linear usage of ints results in fast access. This will hit the load factor only very slightly since the next 2 hash codes are close to random.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Roquen
« Reply #28 - Posted 2010-12-30 11:47:50 »

It's true that I'm ignoring "black box" situations such as placing mixed types in the map.  I'm making the assumption of "this is not a general purpose map that can used as a black box". For instance in:

1  
2  
    Byte    b1 = new Byte((byte)1);
    Integer i1 = new Integer(1);


Both b1 & i1 return the same hash and performing equals will return false and therefore will cause the map to explode if you don't use identityHashCode.  Likewise will be true for any pair of objects which return identical hash values and return false for equals.

However, this isn't much of a limitation for an embedded method for a wide class of problems.
Offline pjt33
« Reply #29 - Posted 2010-12-30 12:04:34 »

This also illustrates why units tests are not the panacea of good code. They only test for what you think of.
I like to write test cases which cover anything I can think of and then have one or two which use a seeded random number generator to create a couple of thousand reproducible tests cases which might cover things I didn't think of. For example,
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
@Test
public void randomTests() {
    Random rnd = new Random(0x12345678);
    for (int i = 0; i < 1000; i++) {
        HashMap<Integer, Integer> expected = new HashMap<Integer, Integer>();
        CHashMap<Integer, Integer> observed = new CHashMap<Integer, Integer>();

        int numOps = 20 + rnd.nextInt(100);
        for (int j = 0; j < numOps; j++) {
            int key = rnd.nextInt(32);
            int op = rnd.nextInt(3);
            switch (op) {
                case 0:
                case 1: // Insert
                   int val = rnd.nextInt();
                    expected.put(key, val);
                    observed.put(key, val);
                    break;

                case 2: // Remove
                   expected.remove(key);
                    observed.remove(key);
                    break;

                default: // Bodged the code somewhere
                   Fail("Switch and nextInt mismatch");
                    break;
            }

            AssertEqual(expected, observed);
        }
    }
}
Pages: [1] 2
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

Pippogeek (40 views)
2014-09-24 16:13:29

Pippogeek (31 views)
2014-09-24 16:12:22

Pippogeek (21 views)
2014-09-24 16:12:06

Grunnt (47 views)
2014-09-23 14:38:19

radar3301 (29 views)
2014-09-21 23:33:17

BurntPizza (65 views)
2014-09-21 02:42:18

BurntPizza (37 views)
2014-09-21 01:30:30

moogie (44 views)
2014-09-21 00:26:15

UprightPath (53 views)
2014-09-20 20:14:06

BurntPizza (55 views)
2014-09-19 03:14:18
List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06

List of Learning Resources
by SilverTiger
2014-07-31 11:54:12

HotSpot Options
by dleskov
2014-07-08 01:59:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!