ClickerMonkey
|
 |
«
Posted
2013-07-29 20:25:34 » |
|
I'm not sure if someone will find a Trie implementation useful, but I just finished one here: https://github.com/ClickerMonkey/TrieHardIf you can think of another use case other than determining exclusion/inclusion or auto-complete... please let me know! I imagine this would only be useful for game libraries, not so much games.
|
|
|
|
relminator
|
 |
«
Reply #1 - Posted
2013-07-30 10:07:58 » |
|
Nice name. I would assume the next build would be: "trie hard 2 - trie harder".
|
|
|
|
actual
|
 |
«
Reply #2 - Posted
2013-07-30 11:11:26 » |
|
So that's you on reddit  . Why are you using Boolean.TRUE and Boolean.FALSE rather than just true amd false? I don't think I've ever seen a Boolean object used in production code before
|
|
|
|
Games published by our own members! Check 'em out!
|
|
ClickerMonkey
|
 |
«
Reply #3 - Posted
2013-07-30 12:09:13 » |
|
When dealing with generic classes in Java (like Trie<String, Boolean>) you ARE dealing with the Boolean object. Sure you can pass true and false in there, but it's auto-boxed into the Boolean.TRUE and BOOLEAN.FALSE. The same applies to using Integer, you can pass in ints (i.e. x), but it auto-boxes it to new Integer(x).
I should change the example so it looks a little nicer, even though what I have is perfectly correct.
|
|
|
|
ClickerMonkey
|
 |
«
Reply #4 - Posted
2013-08-03 02:54:20 » |
|
TrieHard has been updated so a Trie can be used as a Map now. Insertion, removal, getting, and traversal have all had speed improvements.
TrieHard now outperforms all other Trie implementations I've found! (put+get)
|
|
|
|
Several Kilo-Bytes
|
 |
«
Reply #5 - Posted
2013-08-20 19:48:24 » |
|
Some Strings cause an exception. I obtained a Trie through Tries.forString() and am using the Moby word list names.txt file for keys/values. I put each trim()ed String as the key and value in order. The exception is thrown for the 771st String "Ameli".  Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 5 at java.lang.String.charAt(String.java:658) at org.magnos.trie.TrieSequencerCharSequence.hashOf(TrieSequencerCharSequence.java:54) at org.magnos.trie.TrieSequencerCharSequence.hashOf(TrieSequencerCharSequence.java:28) at org.magnos.trie.Trie.put(Trie.java:165)
|
|
|
|
Nate
|
 |
«
Reply #6 - Posted
2013-08-21 12:52:48 » |
|
I wrote a fancy trie once. It is readonly though, the encoder is separate. It only supports 75bit text but is ultra efficient disk- and memory-wise. Eg, it stores 293,365 words in 889,614 bytes. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
| public class Trie { static public final int START = 0xC0000000; static public final int INVALID = 0x800000; static public final int LEAF = 0x80000000;
static private final int END_OF_WORD = 0x80; static private final int HAS_MORE = 0x40; static private final int IS_LEAF = 0x20;
byte[] data;
public Trie () { }
public Trie (FileHandle file) { DataInputStream input = new DataInputStream(file.read()); try { data = new byte[input.readInt()]; int bytes = 0; while (true) { int count = input.read(data, bytes, data.length - bytes); if (count == 0) break; bytes += count; } } catch (IOException ex) { throw new RuntimeException("Error reading file: " + file, ex); } }
public int get (int c, int offset) { if (offset == LEAF) return INVALID;
byte[] data = this.data; c -= 97;
if (offset == START) { offset = c * 3; int index = (data[offset++] & 0xFF) << 16; index |= (data[offset++] & 0xFF) << 8; index |= data[offset] & 0xFF; return index; } if (offset < 0) offset = -offset;
int indexOffset = 0, charIndex = 0; byte b; while (true) { b = data[offset++]; if ((b & 31) == c) break; if ((b & HAS_MORE) == 0) return INVALID; if ((b & IS_LEAF) == 0) indexOffset++; charIndex++; } if ((b & IS_LEAF) != 0) return LEAF;
int nextIndex; if (charIndex == 0 && (b & HAS_MORE) == 0) { nextIndex = offset; } else { while ((b & HAS_MORE) != 0) b = data[offset++]; int indexesStart = offset; for (int i = 0; i < indexOffset; i++) { if ((data[offset++] & 0x80) == 0) continue; if ((data[offset++] & 0x80) == 0) continue; offset++; } b = data[offset++]; nextIndex = b & 0x7F; if ((b & 0x80) != 0) { b = data[offset++]; nextIndex |= (b & 0x7F) << 7; if ((b & 0x80) != 0) nextIndex |= (data[offset] & 0x7F) << 14; } nextIndex += indexesStart; }
if ((data[nextIndex] & END_OF_WORD) != 0) return -nextIndex; return nextIndex; }
public boolean contains (String text) { int index = START; for (int i = 0, n = text.length(); i < n; i++) { index = get(text.charAt(i), index); if (index == INVALID) return false; } return index < 0; } } |
|
|
|
|
Several Kilo-Bytes
|
 |
«
Reply #7 - Posted
2013-08-22 00:23:15 » |
|
Optimized read only structures are neat. Let me ask/confirm a few things: Isn't your implementation 5 bit? (And uses lowercase letters) What do you use to encode the trie? Are there multiple encodings? I wonder how if you did have a 7-bit tree if it would be relatively simple to encode UTF-8. I don't know if it qualifies as efficient for disk storage. It might be close to zip compression. Can the structure be compressed further?
|
|
|
|
Nate
|
 |
«
Reply #8 - Posted
2013-08-22 00:54:23 » |
|
Oops, is it 5 bit? Was so long ago that I wrote it and I just pasted.  I have a tool somewhere that encodes the trie. IIRC it loads it up into inefficient data structures, does some processing, and then writes out the magic format the code above reads. Just one encoding. You're right, zipping my 293,365 word trie file goes from 889,614 to 512,862 bytes. The raw data is 2.77MB, down to 797,139 bytes zipped. 
|
|
|
|
Several Kilo-Bytes
|
 |
«
Reply #9 - Posted
2013-08-23 22:04:59 » |
|
@ClickerMonkey: I loaded the text file with the wrong encoding. (It was not UTF.) However loading the entire list changed the String for which the exception was thrown.
@Nate: I think you could use 7 bits per node if you used a terminator character. The structure would be bigger, but if the Strings were Huffman encoded you could fit more characters into fewer nodes. Not that I am suggesting that. Just pointing it out.
|
|
|
|
Games published by our own members! Check 'em out!
|
|
ClickerMonkey
|
 |
«
Reply #10 - Posted
2013-08-26 11:42:56 » |
|
@Several Kilo-Bytes
So you are still having an exception? Can you post the full stack trace if so? Also, can you send me that list of words... the words inserted before the culprit are the ones to blame mostly. Also, make sure you are using the most up-to-date JAR, I fixed a NPE about two weeks ago. Thanks!
@Nate
That's a pretty awesome Trie! Is there a way to efficiently query all words that start with "moo" or something? I skimmed through the code, but since you didn't post the encoder it's harder to just stare at and understand how it works =P
|
|
|
|
Several Kilo-Bytes
|
 |
«
Reply #11 - Posted
2013-08-26 16:23:34 » |
|
I use the Moby word lists because it's public domain and already in plain text format. My JRE is up to date. I downloaded the source as a zip file. The stack trace is: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 3 at java.lang.String.charAt at org.magnos.trie.TrieSequencerCharSequence.hashOf(TrieSequencerCharSequence.java:54) at org.magnos.trie.TrieSequencerCharSequence.hashOf(TrieSequencerCharSequence.java:28) at org.magnos.trie.Trie.put(Trie.java:165) ...
|
|
|
|
ClickerMonkey
|
 |
«
Reply #12 - Posted
2013-08-26 17:50:12 » |
|
Awesome, found the bug and fixed it! It only happened in very specific scenarios... but it's all better now. The builds and code has been updated on GitHub.
|
|
|
|
Nate
|
 |
«
Reply #13 - Posted
2013-08-26 20:43:41 » |
|
That's a pretty awesome Trie! Is there a way to efficiently query all words that start with "moo" or something? I skimmed through the code, but since you didn't post the encoder it's harder to just stare at and understand how it works =P
You could walk the trie and get out words that start with, but only a contains method is implemented. The encoder is junk, super inefficient in pretty much all ways, but since it runs offline it isn't a huge deal. It'd be better to use something like your trie then write out the readonly encoding. I'd post the encoder but I'm not yet sure if I'll do something with the trie.
|
|
|
|
ClickerMonkey
|
 |
«
Reply #14 - Posted
2013-08-27 13:49:39 » |
|
I also just added TrieSet, it implements java.util.Set and is pretty easy to create: 1 2 3 4 5
| TreeSet<String> set = new TreeSet<String>(Tries.forStrings());
set.add( "meow" );
assert ( set.contains( "meow" ) ); |
|
|
|
|
omadawn
Junior Newbie
|
 |
«
Reply #15 - Posted
2013-11-07 08:29:35 » |
|
I can't think of anything I'd need this for in a game but I have a non-game use for it. I was actually looking at trie a few weeks ago wondering if there was a java lib out there that would make it easy to pull off. * omadawn heads over to github. TY
|
|
|
|
ClickerMonkey
|
 |
«
Reply #16 - Posted
2013-11-15 18:40:22 » |
|
No problem!
|
|
|
|
|