Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (525)
Games in Android Showcase (127)
games submitted by our members
Games in WIP (592)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1] 2
  ignore  |  Print  
  Trie-based dictionary  (Read 3666 times)
0 Members and 1 Guest are viewing this topic.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Posted 2014-07-02 21:50:40 »

I am working on a spelling game, so I need fast dictionary lookup and also to find words from random sets. I'm using the official Scrabble SOWPODS word list.

EDIT: Here is the code incorporating most of the suggestions in the thread, many thanks to everyone. If you want to read the backstory of the code, just go to ENDEDIT and read on :-)

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
153  
154  
155  
156  
157  
158  
159  
160  
161  
162  
163  
164  
165  
166  
167  
168  
169  
170  
171  
172  
173  
174  
175  
176  
177  
178  
179  
180  
181  
182  
183  
184  
185  
186  
187  
188  
189  
190  
191  
192  
193  
194  
195  
196  
197  
198  
199  
200  
201  
202  
203  
204  
205  
206  
207  
208  
209  
210  
211  
212  
213  
package com.clarke.agnes.verbal;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;

/**
 * Parses a dictionary into a trie.
 * Created by agslinda on 7/2/14.
 */

public class TrieDictionary {

    private static final int MAX_WORD_LENGTH = 9;
    ArrayTrie arrayTrie = new ArrayTrie();
    private ArrayList<String> longestWords = new ArrayList<String>(45000);
    private ArrayList<String> words = new ArrayList<String>();

    public TrieDictionary() throws IOException {
        StringBuilder sb = new StringBuilder();
        FileReader reader = new FileReader(new File("...sowpods.txt"));
        int character = reader.read();
        while (character >= 0) {
            //noinspection StatementWithEmptyBody
            if (character == '\r') {
                //do nothing
            } else if (character == '\n') {
                String word = sb.toString();
                if (sb.length() <= MAX_WORD_LENGTH) {
                    arrayTrie.addWord(word);
                    words.add(word);
                }
                if (sb.length() == MAX_WORD_LENGTH) {
                    longestWords.add(word);
                }
                sb.delete(0, sb.length());
            } else {
                sb.append((char)character);
            }
            character = reader.read();
        }
        reader.close();
    }

    private final Random random = new Random();
    public String scrambleWord() {
        TreeMap<Double, Character> map = new TreeMap<Double, Character>();
        String word = longestWords.get(random.nextInt(longestWords.size()));
        while (map.size() != word.length()) {
            for (char c : word.toCharArray()) {
                map.put(random.nextDouble(), c);
            }
            if (map.size() != word.length()) {
                map.clear();
            }
        }
        StringBuilder sb = new StringBuilder();
        for (Character character : map.values()) {
            sb.append(character);
        }
        return sb.toString();
    }

    public boolean exists(String candidateWord) {
        return arrayTrie.wordExists(candidateWord);
    }

    public List<String> unscramble(String scrambledWord) {
        TreeMap<String, Object> results = new TreeMap<String, Object>();
        ArrayList<Character> scrambled = new ArrayList<Character>();
        for (char c : scrambledWord.toCharArray()) {
            scrambled.add(c);
        }
        for (char character : scrambledWord.toCharArray()) {
            CharacterNode node = arrayTrie.get(character);
            if (node != null) {
                int index = scrambled.indexOf(character);
                scrambled.remove(Character.valueOf(character));
                node.unscramble(results, scrambled);
                scrambled.add(index, character);
            }
        }
        return new ArrayList<String>(results.keySet());
    }


    public static void main(String[] args) throws IOException {
        TrieDictionary dict = new TrieDictionary();

        int x = 0, spell, misspell;
        long start = System.currentTimeMillis();
        spell = 0; misspell = 0;
        for (int i = 0; i < dict.words.size(); i += 1) {
            String word = dict.words.get(i);
            x += dict.exists(word) ? spell++ : misspell++;
            x += dict.exists('n' + word) ? spell++ : misspell++;
            x += dict.exists(word + 'n') ? spell++ : misspell++;
        }
        System.out.println("Trie took " + (System.currentTimeMillis() - start) + " right:wrong " + spell + ":" + misspell);
        System.out.println("Trie nodes: " + Node.instanceCount + " Arrays: " + Node.arrayCount);
        System.out.println("Words: " + dict.words.size() + " LongestWords: " + dict.longestWords.size());
        String s = dict.scrambleWord();
        start = System.currentTimeMillis();
        List<String> unscramble = dict.unscramble(s);
        System.out.println("Unscramble took " + (System.currentTimeMillis() - start));
        System.out.println(x + "\r" + "Word of the day: " + s + " has " + unscramble.size() + " solutions");
        int counter = 0;
        for (String s1 : unscramble) {
            if (s1.length() == MAX_WORD_LENGTH) {
                System.out.print("###");
            }
            if (s1.length() >= MAX_WORD_LENGTH / 2) {
                System.out.print(s1 + "   ");
                if (counter++ == 5) {
                    counter = 0;
                    System.out.println();
                }
            }
        }
    }

    private static class ArrayTrie {
        Node root = new Node();
        private void addWord(String word) {
            Node n = root;
            for (char c : word.toCharArray()) {
                n = n.put(c);
            }
            ((CharacterNode)n).completeWord = true;
        }

        private boolean wordExists(String candidateWord) {
            Node n = root;
            char[] chars = candidateWord.toCharArray();
            for (int i = 0; i < chars.length; i++) {
                n = n.get(chars[i]);
                if (n == null) {
                    break;
                }
            }
            return (n != null) && ((CharacterNode) n).completeWord;
        }

        public CharacterNode get(char c) {
            return root.get(c);
        }
    }

    private static class Node {
        protected static int instanceCount = 0;
        protected static int arrayCount = 0;
        protected Node parent = null;
        protected char c;
        private static final int ALPHABET_LENGTH = 26;
        protected CharacterNode[] children = null;

        public CharacterNode put(char c) {
            CharacterNode n = get(c);
            if (children == null) {
                arrayCount++;
                children = new CharacterNode[ALPHABET_LENGTH];
            }
            if (n == null) {
                n = new CharacterNode(this, c);
                children[c % ALPHABET_LENGTH] = n;
            }
            return n;
        }

        public CharacterNode get(char c) {
            return children == null ? null : children[c % ALPHABET_LENGTH];
        }

    }

    private static class CharacterNode extends Node {
        private boolean completeWord = false;
        private static StringBuilder sb = new StringBuilder();
        public CharacterNode(Node parent, char c) {
            this.parent = parent;
            this.c = c;
            instanceCount++;
        }
        public void unscramble(Map<String, Object> results, ArrayList<Character> scrambled) {
            if (completeWord) {
                results.put(outputWord(), null);
            }
            for (int i = 0; i < scrambled.size(); i++) {
                Character character = scrambled.get(i);
                CharacterNode node = get(character);
                if (node != null) {
                    int index = scrambled.indexOf(character);
                    scrambled.remove(character);
                    node.unscramble(results, scrambled);
                    scrambled.add(index, character);
                }
            }
        }

        private String outputWord() {
            sb.append(c);
            Node p = parent;
            while (p != null && p.parent != null) {
                sb.append(p.c);
                p = p.parent;
            }
            sb.reverse();
            String result = sb.toString();
            sb.delete(0, sb.length());
            return result;
        }
    }
}


ENDEDIT

I have three spell check methods implemented for comparison:

1. Naive list iteration (at least I stop iterating if we go past where the misspelled word would have been in the dictionary)
2. Binary search (quick and dirty, i just added it to show how fast Trie is)
3. Trie search (prefix tree search). This is supposed to be the fastest.

Anyway, binary won by a couple of millis... :-(

Trie took 46 right:wrong 7263:14448
Binary took 33 right:wrong 7263:14448
List took 98224 right:wrong 7263:14448

I can get Trie to win by a a margin of 2-3 millis if I use HashMaps and not ArrayLists for the Trie child node lists. But I think that will take too much memory on Android.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
66  
67  
68  
69  
70  
71  
72  
73  
74  
75  
76  
77  
78  
79  
80  
81  
82  
83  
84  
85  
86  
87  
88  
89  
90  
91  
92  
93  
94  
95  
96  
97  
98  
99  
100  
101  
102  
103  
104  
105  
106  
107  
108  
109  
110  
111  
112  
113  
114  
115  
116  
117  
118  
119  
120  
121  
122  
123  
124  
125  
126  
127  
128  
129  
130  
131  
132  
133  
134  
135  
136  
137  
138  
139  
140  
141  
142  
143  
144  
145  
146  
147  
148  
149  
150  
151  
152  
153  
154  
155  
156  
157  
158  
159  
160  
161  
162  
163  
164  
165  
166  
167  
168  
169  
170  
171  
172  
173  
174  
175  
package com.clarke.agnes.verbal;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

/**
 * Parses a dictionary into a trie.
 * Created by agslinda on 7/2/14.
 */

public class TrieDictionary {

    Trie trie = new Trie();
    private ArrayList<String> words = new ArrayList<String>(280000);

    public TrieDictionary() throws IOException {
        StringBuilder sb = new StringBuilder();
        FileReader reader = new FileReader(new File("...sowpods.txt"));
        int character = reader.read();
        while (character >= 0) {
            //noinspection StatementWithEmptyBody
            if (character == '\r') {
                //do nothing
            } else if (character == '\n') {
                trie.addWord(sb.toString());
                words.add(sb.toString());
                sb.delete(0, sb.length());
            } else {
                sb.append((char)character);
            }
            character = reader.read();
        }
        reader.close();
    }

    public boolean exists(String candidateWord) {
        return trie.wordExists(candidateWord);
    }

    public boolean existsOld(String candidateWord) {
        for (int i = 0; i < words.size(); i++) {
            if (candidateWord.equals(words.get(i))) {
                return true;
            } else if (candidateWord.compareTo(words.get(i)) < 0) {
                return false;
            }
        }
        return false;
    }

    public boolean existsBinary(String candidateWord) {
        int index = words.size() / 2;
        int upper = words.size();
        int lower = 0;
        int breaker = 0;
        String wordFromDict = words.get(index);
        while (!wordFromDict.equals(candidateWord) && breaker++ < 20) {
            if (candidateWord.compareTo(wordFromDict) > 0) {
                lower = index;
                index = (index + upper) / 2;
            } else {
                upper = index;
                index = (index + lower) / 2;
            }
            wordFromDict = words.get(index);
        }

        return wordFromDict.equals(candidateWord);
    }


    public static void main(String[] args) throws IOException {
        TrieDictionary dict = new TrieDictionary();
        System.out.println("able " + dict.exists("able") + dict.existsBinary("able") + dict.existsOld("able"));
        System.out.println("zzz " + dict.exists("zzz") + dict.existsBinary("zzz") + dict.existsOld("zzz"));
        System.out.println("hjkglfy " + dict.exists("hjkglfy") + dict.existsBinary("hjkglfy") + dict.existsOld("hjkglfy"));
        System.out.println("true " + dict.exists("true") + dict.existsBinary("true") + dict.existsOld("true"));
        System.out.println("abstemious " + dict.exists("abstemious") + dict.existsBinary("abstemious") + dict.existsOld("abstemious"));

        int spell = 0, misspell = 0;
        long start = System.currentTimeMillis();
        for (int i = 0; i < dict.words.size(); i += 37) {
            String word = dict.words.get(i);
            int x = dict.exists(word) ? spell++ : misspell++;
            x = dict.exists('n' + word) ? spell++ : misspell++;
            x = dict.exists(word + 'n') ? spell++ : misspell++;
        }
        System.out.println("Trie took " + (System.currentTimeMillis() - start) + " right:wrong " + spell + ":" + misspell);

        start = System.currentTimeMillis();
        spell = 0; misspell = 0;
        for (int i = 0; i < dict.words.size(); i += 37) {
            String word = dict.words.get(i);
            int x = dict.existsBinary(word) ? spell++ : misspell++;
            x = dict.existsBinary('n' + word) ? spell++ : misspell++;
            x = dict.existsBinary(word + 'n') ? spell++ : misspell++;
        }
        System.out.println("Binary took " + (System.currentTimeMillis() - start) + " right:wrong " + spell + ":" + misspell);

        start = System.currentTimeMillis();
        spell = 0; misspell = 0;
        for (int i = 0; i < dict.words.size(); i += 37) {
            String word = dict.words.get(i);
            int x = dict.existsOld(word) ? spell++ : misspell++;
            x = dict.existsOld('n' + word) ? spell++ : misspell++;
            x = dict.existsOld(word + 'n') ? spell++ : misspell++;
        }
        System.out.println("List took " + (System.currentTimeMillis() - start) + " right:wrong " + spell + ":" + misspell);

    }

    private static class Trie {
        Node root = new Node();
        private void addWord(String word) {
            Node n = root;
            for (char c : word.toCharArray()) {
                n = n.put(c);
            }
            ((CharacterNode)n).completeWord = true;
        }

        private boolean wordExists(String candidateWord) {
            Node n = root;
            for (char c : candidateWord.toCharArray()) {
                n = n.get(c);
                if (n == null) {
                    break;
                }
            }
            return (n != null) && ((CharacterNode) n).completeWord;
        }
    }

    private static class Node {
        protected static final List<CharacterNode> empty = Collections.emptyList();
        protected List<CharacterNode> children = empty;

        public Node put(char c) {
            for (CharacterNode n : children) {
                if (n.value == c) {
                    return n;
                }
            }
            CharacterNode n = new CharacterNode(c);
            if (children.equals(empty)) {
                children = new ArrayList<CharacterNode>();
            }
            children.add(n);
            return n;
        }

        public CharacterNode get(char c) {
            for (CharacterNode n : children) {
                if (n.value == c) {
                    return n;
                }
            }
            return null;
        }
    }

    private static class CharacterNode extends Node {
        private boolean completeWord = false;
        private final char value;

        private CharacterNode(char value) {
            this.value = value;
        }

    }

}

Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #1 - Posted 2014-07-02 22:18:49 »

Here is the faster Node class, with it the trie completes the test in 30ms on my system, instead of approx ~44ms, and consistently beat binary search!

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
    private static class Node {
        protected static final Map<Character, CharacterNode> emptyMap = Collections.<Character,CharacterNode>emptyMap();
        protected Map<Character, CharacterNode> childrenMap = emptyMap;

        public Node put(char c) {
            CharacterNode n = childrenMap.get(c);
            if (childrenMap.equals(emptyMap)) {
                childrenMap = new HashMap<Character,CharacterNode>();
            }
            if (n == null) {
                n = new CharacterNode(c);
                childrenMap.put(c, n);
            }
            return n;
        }

        public CharacterNode get(char c) {
            return childrenMap.get(c);
        }
    }

Offline richierich
« Reply #2 - Posted 2014-07-02 23:17:56 »

Could try maybe something like this..?

String scan is pretty fast - I imagine good compared to autoboxing the char, making the hash, and whatever hoops HashMap jumps through. O(1) sounds great but if you ever step through library implementations for simple collections it's amazing the complex nonsense they get up to sometimes! String operations on the other hand are always highly tuned.

I'd try it but don't have a copy of SOWPODS laying around Cheesy

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
    private static class Node {
       protected String childrenChars = "";
       protected ArrayList<CharacterNode> childrenNodes;
       
        public Node put(char c) {
            CharacterNode n = get(c);
            if (n == null) {
               childrenChars += c;
                n = new CharacterNode(c);
               childrenNodes.add(n);
            }
            return n;
        }

        public CharacterNode get(char c) {
           int index = childrenChars.indexOf(c);
           return (index == -1) ? null : childrenNodes.get(index);
        }
    }
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline Roquen
« Reply #3 - Posted 2014-07-03 07:35:16 »

Quote
This is supposed to be the fastest.
Uggh.  Something can only be "supposedly fastest" for a very narrowly defined set of operations/assumptions.

If you care about speed and size here's some thoughts.

The dictionary is dynamic because obviously it changes at runtime rather than being fixed.  Ya know, like: foobaz suddenly becomes a legal word.  The real cost of dynamic data-structures is visiting randomize memory.  You mention android so it would be sane to reduce memory size.  Speed?  How many queries do you expect to perform per game "frame"?  Do you need an AI to figure out legal words it can create?  If SOWPODS is the real target then the dataset has a fair set of limitations like words are only on 2-15 characters (assuming wikipedia is correct).  If you really wanted to be able to check lots of potential words as being legal per frame then bloom filter could be a reasonable idea.
Offline pjt33
« Reply #4 - Posted 2014-07-03 09:30:37 »

3. Trie search (prefix tree search). This is supposed to be the fastest.
It probably would be if you implemented it properly. Linear search through a list at each level is not the classic trie implementation. It should be a single array lookup. The problem then is memory usage; I've found that if you have an expensive offline building process which generates the trie and then combines isomorphic nodes, it takes about as much space as the original word list but performs like a trie.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #5 - Posted 2014-07-03 10:40:28 »

OK, here is the (I think) correct implementation - I look up the instances from an array, with the index calculated as char % ALPHABET_LENGTH.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
    private static class ArrayNode2 {
        protected static int instanceCount = 0;
        private static final int ALPHABET_LENGTH = 52;
        protected ArrayCharacterNode2[] children = new ArrayCharacterNode2[ALPHABET_LENGTH];

        public ArrayNode2 put(char c) {
            ArrayCharacterNode2 n = get(c);
            if (n == null) {
                n = new ArrayCharacterNode2(c);
                children[c % ALPHABET_LENGTH] = n;
            }
            return n;
        }

        public ArrayCharacterNode2 get(char c) {
            return children[c % ALPHABET_LENGTH];
        }
    }


This gives these results (I increased the size of the test, btw):

EDIT: I updated the results after increasing the memory for the process.

ListTrie took 428 right:wrong 172880:345760
MapTrie took 74 right:wrong 172880:345760
ArrayTrie took 70 right:wrong 172880:345760 //richierich method
ArrayTrie2 took 61 right:wrong 172880:345760 //modulus method
Binary took 134 right:wrong 172880:345760

Twice as fast is good enough for me :-)

Offline richierich
« Reply #6 - Posted 2014-07-03 18:32:31 »

Very clever - nice to see incremental improvements occurring!

But why is the alphabet size 52 there and not 26? For mixed-case letters? In which case is it assuming that lowercase and uppercase are contiguous unicode ranges?

Very interesting thread - I'd never looked at tries before although I'd heard of them! With sensible hat on I agree with Roquen that going with the low-tech/low-memory solution (i.e. binary search the basic list) might be best given how similar the speed is for the data volume expected...
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #7 - Posted 2014-07-03 18:52:52 »

52 because I am one dumb-assed programmer. I was working from a different dictionary this AM where all the words are uppercase, but my test misspelled words are word + 'n' (lowercase) - works fine for the other methods which test on the actual character at each node, but for the mod method, an invalid character causes you to dive down the wrong branch of the tree. I thought it was an error in the dictionary and only later realized it was my test code. I should check and throw an InvalidCharacterException in such cases :-)

Anyways, now that the arrays are 26 long the trie goes even faster (not sure why, though) - the new method is about 40% faster than it was.

If it was just spell checking binary would be enough, but i need to work out all possible words for random sets - a trie is much more suited (faster and it also lets you prune out many possible candidate words without having to test them).

Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 833
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #8 - Posted 2014-07-03 18:57:14 »

Anyways, now that the arrays are 26 long the trie goes even faster (not sure why, though) - the new method is about 40% faster than it was.
You can fit more Nodes in your CPU cache, leading to less cache misses. Memory latency is your bottleneck, so reducing your memory footprint boosts performance.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #9 - Posted 2014-07-03 19:06:37 »

My next improvement is to not generate meaningless new arrays on leaf nodes, should cut down the cache misses a bit more.

Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #10 - Posted 2014-07-03 19:18:29 »

Yes that is what I meant in my previous post, I will only create the array if I have to write into it.

Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 833
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #11 - Posted 2014-07-03 19:27:36 »

At the cost of complicating your code, it's probably way faster if you traverse the tree 2 chars at a time. Your lookup tables would be 26*26=676 bytes - they easily fit in (4K) pages.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #12 - Posted 2014-07-03 21:13:23 »

Why would the lookup table entries be one byte in length?

Offline Riven
« League of Dukes »

« JGO Overlord »


Medals: 833
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #13 - Posted 2014-07-03 21:18:27 »

Excuse me, multiply that by 4, as it's a table of references.

(to my best knowledge, even in 64bit JVMs, references are 4 byte pointers, multiplied by 8 to calculate the memory address)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #14 - Posted 2014-07-03 21:52:17 »

Thought so... The lookup table would be 26 x 27 x 4 = 2808 bytes. It needs an extra row for odd-lengthed words. I might give it a try this weekend.

Offline Roquen
« Reply #15 - Posted 2014-07-04 05:45:09 »

(to my best knowledge, even in 64bit JVMs, references are 4 byte pointers, multiplied by 8 to calculate the memory address)
64-bit HotSpot defaults to compressed-oops, so yeah 4 bytes/ref (the headers per object are bigger).
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #16 - Posted 2014-07-04 23:56:39 »

Well, I've built it up into my own little scrabble cheat program... If you write the cheat program yourself, is it still cheating?

Compared to spell checking, finding words from random letters is really slow. I can spell check the whole dictionary plus double that in spelling errors in 80 milliseconds, but to find thge words for one set of nine letters takes 10 to 15 milliseconds.

Offline Roquen
« Reply #17 - Posted 2014-07-05 06:24:16 »

Sigh.  That's why I asked about AI.  True or false for a word is a different problem then a specialized reg-exp search.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #18 - Posted 2014-07-05 09:17:57 »

In my original post I did say " I need fast dictionary lookup and also to find words from random sets [of letters]".

Offline Roquen
« Reply #19 - Posted 2014-07-05 10:49:00 »

Indeed you did but it was lost in the noise.  In this case it doesn't matter but if you were writing a game that required this functionality then matching is the hard problem and the one to address and not the true/false question.
Offline richierich
« Reply #20 - Posted 2014-07-05 11:06:01 »

What kind of game is this?! Is there such a thing as an arcade action twich spelling game Cheesy Is 15ms really too long to do the permutations search?

Wikipedia says SOWPODS has 267,751 words. Permutations of 9 means 362,880 lookups(*), most of which are quick failures a couple of letters down the trie. So is 15ms just a straight lookup for each permutation, or have you added more intelligence to the process?

E.g. maybe you could recode exists() to return a number which was the level of the trie at which the search failed, or 0 if the search was successful. Then e.g. with letters ACFFEKNSW you would test the first permutation "ACFFEKNSW" and get 3 back so you could skip testing all other permutations beginning with ACF. Just thinking aloud - maybe you did something like that already?

Another thought, did you try just whacking the SOWPODS words into a HashSet<String> and forget the trie altogether?

{(*) Edit: OK before you say it I see that 9 letters yield more than 362,880 possible words, that would be if all words were 9 letters long. Obv. it's more than that! Smiley}
Offline Roquen
« Reply #21 - Posted 2014-07-05 11:29:48 »

Yeah an AI for a scrabble like game isn't straight forward.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #22 - Posted 2014-07-05 17:23:18 »

I use the trie to discard branches of possible words, I will post the code later. I'm over the moon with 15 ms, although it would be longer than that in android. If i wanted to be a real hard case, I could multithread the search, it would be very easy, but possibly insane.

Offline Roquen
« Reply #23 - Posted 2014-07-05 17:46:09 »

Brute force isn't the answer.  I don't know the problem space but a ternary search trie (which can be flattened) will be exponentially faster.  You have a set of letters and a known set of reg-exps which can be optimized.  A single traversal can produce all possible results in very little time.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #24 - Posted 2014-07-05 21:36:45 »

What kind of game is this?! Is there such a thing as an arcade action twich spelling game Cheesy Is 15ms really too long to do the permutations search?

Well, there will be screen shake if you hit a triple combo on your words. And there are power-ups :-)

Quote
Wikipedia says SOWPODS has 267,751 words. Permutations of 9 means 362,880 lookups(*), most of which are quick failures a couple of letters down the trie. So is 15ms just a straight lookup for each permutation, or have you added more intelligence to the process?

E.g. maybe you could recode exists() to return a number which was the level of the trie at which the search failed, or 0 if the search was successful. Then e.g. with letters ACFFEKNSW you would test the first permutation "ACFFEKNSW" and get 3 back so you could skip testing all other permutations beginning with ACF. Just thinking aloud - maybe you did something like that already?

Here is the new node class:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
    private static class ArrayCharacterNode extends ArrayNode2 {
        private boolean completeWord = false;
        private static StringBuilder sb = new StringBuilder();
        public ArrayCharacterNode() {
            instanceCount++;
        }
        public void unscramble(Map<String, Object> results, ArrayList<Character> scrambled, ArrayList<Character> unscrambled) {
            if (completeWord) {
                outputWord(results, unscrambled);
            }
            for (int i = 0; i < scrambled.size(); i++) {
                Character c = scrambled.get(i);
                ArrayCharacterNode node = get(c);
                if (node != null) {
                    ArrayList<Character> newScrambled = new ArrayList<Character>(scrambled);
                    newScrambled.remove(c);
                    ArrayList<Character> newUnscrambled = new ArrayList<Character>(unscrambled);
                    newUnscrambled.add(c);
                    node.unscramble(results, newScrambled, newUnscrambled);
                }
            }
        }

        private static void outputWord(Map<String, Object> results, ArrayList<Character> unscrambled) {
            for (int i = 0; i < unscrambled.size(); i++) {
                sb.append(unscrambled.get(i));
            }
            results.put(sb.toString(), null);
            sb.delete(0, sb.length());
        }
    }


I'm not entirely happy with it, I feel the method is creating too many objects as it walks the tree.

Quote
Another thought, did you try just whacking the SOWPODS words into a HashSet<String> and forget the trie altogether?

HashSet is usually as  fast as Trie (for simple word lookups), but Trie can pull ahead if I do a vast number of checks. But remember, if I did not have the trie, for word combinations I would have to check every combination of the 9 letters. With the trie, i can exclude branches in one go.

Offline richierich
« Reply #25 - Posted 2014-07-06 22:50:00 »

Well, there will be screen shake if you hit a triple combo on your words. And there are power-ups :-)

Awesome! "Achievement unlocked .. Sh-sh-sh-sh-Shakespeare!"

The recursive walk down the trie looks really elegant.

I'm not entirely happy with it, I feel the method is creating too many objects as it walks the tree.

Would a String be simpler than ArrayList<Character> for unscrambled? It has to be turned into a string for output anyway.

For the scrambled pool of unused letters I wonder if you could pass the same object all the way down, replacing characters back into the pool after each recursive call to unscramble(), rather than building new ones at each level. I'm not an expert in Java linked lists so there's probably a better way to code it but would something like this work...?

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
   public void unscramble(Map<String, Object> results, LinkedList<Character> scrambled, String unscrambled) {
        if (completeWord) {
            results.put(unscrambled, null);
        }
        for (int i = 0; i < scrambled.size(); i++) {
            Character c = scrambled.get(i);
            ArrayCharacterNode node = get(c);
            if (node != null) {
               //Take character from unused pool and add to possible word
               scrambled.remove(i);
               String newUnscrambled = unscrambled + c;
                node.unscramble(results, scrambled, newUnscrambled);
                //Replace character where it was in the unused pool
               scrambled.add(i, c);
            }
        }
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #26 - Posted 2014-07-06 22:57:12 »

That's a neat idea to cut down on the object creation. I can also cut out the unscrambled variable by giving each node a reference to its parent... Then every complete word can write itself out by recursing to the root of the trie.

Offline pjt33
« Reply #27 - Posted 2014-07-07 09:19:54 »

Would a String be simpler than ArrayList<Character> for unscrambled? It has to be turned into a string for output anyway.
Not really, in the sense that because strings are immutable it would increase the amount of object creation. The way to reduce that is to use a single char[] (tracking the length of the subarray which is relevant) and then use the corresponding String constructor when a word is found.
Offline richierich
« Reply #28 - Posted 2014-07-07 10:24:51 »

Yes, fair point. I guess I was thinking that creating strings was at least simpler than creating ArrayList<Character>s. On reflection what's happening with the unscrambled string over time is that it's constantly having extra letters added and removed from its tail end. So a single Stack<Character> used throughout the whole trie walk might be the thing, although what you said about just having a char[] and a tail pointer is effectively a low-tech stack, and surely much more efficient...

Perhaps it's irrelevant anyway if ags1's parent pointer thing works out. I guess that will suffer from increasing the memory footprint, which was pointed out earlier may be the bottleneck.
Offline ags1

JGO Wizard


Medals: 72
Projects: 3
Exp: 5 years


Make code not war!


« Reply #29 - Posted 2014-07-07 18:59:31 »

Memory won't be an issue as this is now only going to run on my build machine. The game will just have a couple of megabytes of precalculated 9-letter puzzles, which I will load into HashSets for the in-game spell checks.

Pages: [1] 2
  ignore  |  Print  
 
 

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

toopeicgaming1999 (66 views)
2014-11-26 15:22:04

toopeicgaming1999 (58 views)
2014-11-26 15:20:36

toopeicgaming1999 (12 views)
2014-11-26 15:20:08

SHC (24 views)
2014-11-25 12:00:59

SHC (24 views)
2014-11-25 11:53:45

Norakomi (29 views)
2014-11-25 11:26:43

Gibbo3771 (24 views)
2014-11-24 19:59:16

trollwarrior1 (37 views)
2014-11-22 12:13:56

xFryIx (76 views)
2014-11-13 12:34:49

digdugdiggy (53 views)
2014-11-12 21:11:50
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!