Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (801)
Games in Android Showcase (237)
games submitted by our members
Games in WIP (867)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Ok, memory management problem  (Read 1753 times)
0 Members and 1 Guest are viewing this topic.
Offline i30817

Junior Devvie





« Posted 2009-07-06 00:55:03 »

So i have ebook reader. And naturally want to create a Gutenberg book downloader so first time users can read something out of the box.

Fine. So i envisioned a library panel, over GlazedLists, like i already have for the local files, but that instead of showing all possibilities only begins to shows after 3-4 characters inserted. Should be enough for filtering right? And works almost the same way - i like consistency.

So i need a way to search. Looking at the Gutenberg site there is a rdf file 5 mb zipped. Wunderbar i think. Actually it is 100 mb unzipped - so don't unzip.

So then i need a indexation method for rapid searching of rdf. Lucene comes to mind. Google finds LuceneSail easily.

First Problem : LuceneSail indexes and duplicates the text, so those 100mb become 210mb in the disk somewhere. But searches are fast at least.

Second Problem : Indexing takes forever (5-8 minutes) and too much memory for the main program. The too much memory for the main program can be alleviated if you have a monster machine and lots of memory. Then i can do this :
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
    /**
     * This method creates a new process that will run a new jvm
     * on the main of the given class, with the selected arguments.
     * It already flushes the output and inputstream of the forked jvm
     * into the current jvm.
     * The forked jvm uses the same java.exe and classpath as the current
     * one.
     * @param javaClass class with main method
     * @param args jvm properties.
     */

    public static void forkJavaAndWait(Class klass, String ... args) throws IOException, InterruptedException{
        String javaExe = System.getProperty("java.home") + File.separator + "bin" + File.separator + "java.exe";
        String classpath = System.getProperty("java.class.path");
        List<String> l = new ArrayList<String>(4+args.length);
        l.add(javaExe);
        l.add("-cp");
        l.add(classpath);
        l.addAll(Arrays.asList(args));
        l.add(klass.getCanonicalName());
        ProcessBuilder pb = new ProcessBuilder(l);
        pb.redirectErrorStream(true);
        final Process p = pb.start();
        //process builder stupidity (would need 2 threads if redirectErrorStream(false))
        new Thread(new Runnable(){
                    @Override
        public void run() {
            String line;
            BufferedReader bufferedStderr = new BufferedReader(new InputStreamReader(p.getInputStream()));
            try {
                while ((line = bufferedStderr.readLine()) != null) {
                    System.out.println(line);
                }
            } catch (IOException ex) {
                Logger.getLogger(IoUtils.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
        }, "ProcessBuilderInputStreamConsumer").start();
        int e = p.waitFor();
        if (e != 0) {
            p.destroy();
            throw new IllegalStateException("couldnt fork the java process, error code "+e);
        }
    }


Third Problem : But the files are not deleted for some stupid reason if the java process is killed (in a finally in the given class main - have i to use a shutdown hook or the SignalHandler?).


What would you prefer:
1) stupid search that scraps project gutenberg webpages and doesn't show possibilities as you type.

2) Smart search that shows possibilities after some typing and eats 210mb.
2a) and that you need a beastly machine to use takes 5 minutes to create (once) or update (more than once).
2b) and that you need to download a (24mb) zipped index and unzip it (once).
2c) and that is created on the installer (that i don't have now) and works as 2a.
Offline Json

Junior Devvie


Exp: 7 years



« Reply #1 - Posted 2009-07-06 09:53:55 »

I'm using Lucene for a project at work and I managed to have an index at the third of the size of the original data. I also did some index speed tests.

Test facts
Number of files: 95.000
Total amount of data: 1.13 GB


Test 1 (single thread indexer)
In this first test I just wanted to index all the data.

Time consumed: 12 minutes 12 seconds


Test 2 (multithreaded indexer)
In this second test I used multiple threads to index all the data to see if I could speed things up.
 
Time consumed: 6 minutes 17 seconds


Test 3 (searching)
My final test was a search test. In this test I have changed one of the 95.000 files to contain my name and I run a search on it.

Time consumed: 93 milliseconds


Those are my very simple test results for Lucene. In the end I went for single thread indexing because I usually don't have that many things to index. I index things as they are added. My test files were basically text files with about 12Kb data each, the typical thing I wanted to index.

// Json
Offline i30817

Junior Devvie





« Reply #2 - Posted 2009-07-12 19:16:11 »

 Cheesy

Managed to reduce the index time to 41 seconds  and the space to 35mb by filtering the parts of the rdf that i care for and optimizing them (removing redundant tags etc).

The LuceneSail guys should do this themselves (they are after all indexing for normally static queries).
Sent the appropriate bitching mail with suggestions.

Pages: [1]
  ignore  |  Print  
 
 

 
Riven (353 views)
2019-09-04 15:33:17

hadezbladez (5191 views)
2018-11-16 13:46:03

hadezbladez (2076 views)
2018-11-16 13:41:33

hadezbladez (5413 views)
2018-11-16 13:35:35

hadezbladez (1128 views)
2018-11-16 13:32:03

EgonOlsen (4550 views)
2018-06-10 19:43:48

EgonOlsen (5413 views)
2018-06-10 19:43:44

EgonOlsen (3090 views)
2018-06-10 19:43:20

DesertCoockie (3986 views)
2018-05-13 18:23:11

nelsongames (4590 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!