I like it, very polished!
I've still not done anything with the 2nd revision of my word scraper code.
The 1st revision used reflection like yours, but after losing the code, I rewrote it.
This time it reads the binary class files directly from rt.jar, extracts all utf8 Strings from the constants pool & then does some filtering.
I found this gets a much bigger dictionary (~20000 unique words), alot coming from comments contained in the classes.
It also gives slightly smaller code due to fewer references to api methods.
Though, i'm using regex for performing
all of the word filtering - and it's incredibly slow atm.
The culprit is mostly this monstrosity:
1
| s.replaceAll("([a-z])([A-Z])","$1 $2" ).split("\\[L|(.)\\1{2,}|[\\W\\d_]+") |
It takes about 2 minutes to fully parse the contents of rt.jar!

Though due to word duplication there are obviously diminishing returns the further you get into the process... the first few 1000 words only take a second or two, so it isn't
quite so fundamentally flawed

p.s.
I havn't looked at that regex expression since I wrote it back in sept. and I can honestly say without a regex manual, I have no idea what that split(...) is doing - isn't regex brilliant!
