Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (522)
Games in Android Showcase (127)
games submitted by our members
Games in WIP (589)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Very slow ascii text parser  (Read 3748 times)
0 Members and 1 Guest are viewing this topic.
Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Posted 2006-04-21 18:20:37 »

2.89MB text file takes around a minute to parse on my A64 3000+ 1GB RAM PC.
Yes I'm using regex to parse the files with a BufferedReader, line by line.
Please don't tell me I have to parse the file byte->byte?
Tips, tricks?

Here is my list already:
Get rid of regex and parse byte for byte.

Are there any alternatives?

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline kevglass

« JGO Spiffy Duke »


Medals: 195
Projects: 24
Exp: 18 years


Coder, Trainee Pixel Artist, Game Reviewer


« Reply #1 - Posted 2006-04-21 18:26:15 »

Whats the size of the buffer on your buffered reader? Would think disk access would be the bottle neck initially.

Could always just read it all into a buffer yourself and then stream from memory instead?

Kev

Offline whome

Junior Devvie




Carte Noir Java


« Reply #2 - Posted 2006-04-21 22:42:15 »

Do you you an explicit regexp pattern objects or recreate (implicitly) on each line?
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline swpalmer

JGO Coder


Exp: 12 years


Where's the Kaboom?


« Reply #3 - Posted 2006-04-21 23:02:22 »

Sample code would help.

Like whome says.. make sure you compile the pattern once and reuse it, if possible.


I do something like this:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
   public static void main(String[] args) throws Exception {
      File f = new File(args[0]);
      FileInputStream fis = new FileInputStream(f);
      ByteBuffer bb = ByteBuffer.allocate((int)f.length());
      fis.getChannel().read(bb);
      bb.flip();
      CharBuffer cb = Charset.forName("UTF-8").decode(bb);
     
      Pattern myPat = Pattern.compile(your_pattern_here);
      Matcher mat = myPat.matcher(cb);
      while (mat.find()) {
         do_something_with( mat.group() );
      }
   }

Offline Jeff

JGO Coder




Got any cats?


« Reply #4 - Posted 2006-04-21 23:18:02 »

Depending on what youa re trying to do, it also may be a lot faster to parse with a real recursive-descent-parser then with regexp...

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline Mithrandir

Senior Devvie




Cut from being on the bleeding edge too long


« Reply #5 - Posted 2006-04-22 05:34:09 »

Regexp parsing is a huge performance killer. As Jeff said, make use of a proper recursive descent parser, that will make things much easier and faster - particularly if you can find one that does it without resorting to creating strings for each token. Javacc is normally the standard choice, but for big files, that string creation can be quite a performance issue, as we've found in Xj3D's VRML parsing.

The site for 3D Graphics information http://www.j3d.org/
Aviatrix3D JOGL Scenegraph http://aviatrix3d.j3d.org/
Programming is essentially a markup language surrounding mathematical formulae and thus, should not be patentable.
Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Reply #6 - Posted 2006-04-23 16:44:21 »

Thanks guys.
Fixed the problem.

I've reverted to a tree parsing algorithm which is recursive.
Very fast.

I load 2048 characters of data every loop and go through it, sort and parse.

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline princec

« JGO Spiffy Duke »


Medals: 421
Projects: 3
Exp: 16 years


Eh? Who? What? ... Me?


« Reply #7 - Posted 2006-04-23 17:38:31 »

So how come TextPad does it so bloody fast then? 2MB files in the blink of an eye.

Cas Smiley

Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Reply #8 - Posted 2006-04-24 07:14:52 »

Because it uses C? Tongue
j/k


So how come TextPad does it so bloody fast then? 2MB files in the blink of an eye.

Cas Smiley

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline cylab

JGO Ninja


Medals: 55



« Reply #9 - Posted 2006-04-24 08:20:18 »

Quote
Because it uses C?

I doubt that using C leads to huge performance gains in regex processing. The sun regex implementation has a fairly good performance, so I suspect you weren't using precompiled patterns, but something like line.matches("<yourRegex>");...

Mathias - I Know What [you] Did Last Summer!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Reply #10 - Posted 2006-04-24 08:38:22 »

I was using my own pre-compiled regex stuff.


Quote
Because it uses C?

I doubt that using C leads to huge performance gains in regex processing. The sun regex implementation has a fairly good performance, so I suspect you weren't using precompiled patterns, but something like line.matches("<yourRegex>");...

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline Jeff

JGO Coder




Got any cats?


« Reply #11 - Posted 2006-04-25 02:44:14 »

So how come TextPad does it so bloody fast then? 2MB files in the blink of an eye.

Cas Smiley

I assume this is a one-regexp pass across the file?

Big diff btw that and trying to recognize EVERY token in the file by way of a group of reg exps.

Got a question about Java and game programming?  Just new to the Java Game Development Community?  Try my FAQ.  Its likely you'll learn something!

http://wiki.java.net/bin/view/Games/JeffFAQ
Offline erikd

JGO Ninja


Medals: 16
Projects: 4
Exp: 14 years


Maximumisness


« Reply #12 - Posted 2006-04-25 22:58:14 »

Do the same simple regex stuff you do in TextPad in Java and you'll see it isn't any slower in java.
I know KILER's remark about C was probably tongue in cheek, but to give an example: I made some translation software in java for a client which quite heavily depended on regex to parse the input files (and had to translate huge numbers of large text files to *massive* XML files) and it performed way, way better than their proof of concept which was written in C (which also used regex for parsing). And the finished product took me alone less than half the time to write than the C based, half functional proof of concept. The regexp part in my program didn't even rank high in the profiler output.
I guess it must have been a special case for KILER which required a specialized parser for good performance, because my experience is that there's nothing wrong with java's regexp implementation's performance.

Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Reply #13 - Posted 2006-05-02 12:11:02 »

I'm very sure you guys would be interested in this. Smiley
http://members.optusnet.com.au/ksaho/show/loader.JPG

AbstractModelLoader:
http://members.optusnet.com.au/ksaho/show/utils/AbstractModelLoader.java

PlyLoader:
http://members.optusnet.com.au/ksaho/show/utils/PlyLoader.java

Some of you will probably wonder, "Why now?". Well my design was completely different before I destroyed it and now I came back to the issue I was having and I've done further analysis of the situation. Previously I really didn't have that much time to contribute to it due to other work.
I had tested this same code against 2 different types of patterns for floating point numbers.

My current floating point number pattern in the code and against "(\\d+?)".
When I tested against the floating point pattern it is dead slow however when my class was tested against the basic digit pattern it was instant.

Originally I had taken results both singular and average of each method in that class and all had come down to ne-6 and ne-4 performance respectively.
I have come to the conclusion my pattern is the problem here, question is can it be improved?
I've looked in about compiling parameters and I'm going to further test them out however it takes 30seconds to run this code.
Try it. Smiley

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline abies

Senior Devvie





« Reply #14 - Posted 2006-05-02 18:19:40 »

Using regexp for any kind of parsing is asking for a major issues, performance-wise.

For ply files, even string tokenizer is enough to parse simple cases. Below there is some code I have written a long time ago (It works with only specific format, I think that there are at least 2 or 3 variations, doing a full loader is a bit more complicated).

For something with acceptable performance and fast to develop, use javacc. For really high performance, you will need to write everything by yourself from the scratch. (I have used javacc to load nwn ascii files, then rewritten it by hand and got a 3 times improvement in speed...).

Here you can check out my handcrafted parser for nwn files
http://cvs.sourceforge.net/viewcvs.py/nwn-j3d/nwn/src/net/sf/nwn/loader/ManualParser.java?rev=1.16&view=markup

and below is some crappy code for ply parsing I have found in my archives


1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
        BufferedReader br = new BufferedReader(new FileReader(filename));
       
        StreamTokenizer st = new StreamTokenizer(br);
        st.resetSyntax();
        st.eolIsSignificant(false);
        st.wordChars(0,255);
        st.whitespaceChars(' ', ' ');
        st.whitespaceChars('\n','\n');
        st.whitespaceChars('\r','\r');
        st.whitespaceChars('\t','\t');
        String str;
        float[] vertices = null;
        int[] faces = null;
       
        while ( true ) {
            int token = st.nextToken();
            if ( token == StreamTokenizer.TT_EOF )
                break;
            if ( token != StreamTokenizer.TT_WORD)
                continue;
           
            if ( st.sval.equalsIgnoreCase("element")) {
                st.nextToken();
                if ( st.sval.equalsIgnoreCase("vertex") ) {
                    st.nextToken();
                    vertices = new float[3*Integer.parseInt(st.sval)];
                } else if (st.sval.equalsIgnoreCase("face")) {
                    st.nextToken();
                    faces = new int[3*Integer.parseInt(st.sval)];
                }
            } else if (st.sval.equalsIgnoreCase("end_header") ){
                break;
            }  
        }
       
       
        for ( int i =0; i < vertices.length; i+=3) {
            st.nextToken();
            vertices[i] = Float.parseFloat(st.sval);
            st.nextToken();
            vertices[i+1] = Float.parseFloat(st.sval);
            st.nextToken();
            vertices[i+2] = Float.parseFloat(st.sval);
            st.nextToken();
            st.nextToken();
           
        }
       
        for ( int i =0; i < faces.length; i+=3 ) {
            st.nextToken();
            st.nextToken();
            faces[i] = Integer.parseInt(st.sval);
            st.nextToken();
            faces[i+1] = Integer.parseInt(st.sval);
            st.nextToken();
            faces[i+2] = Integer.parseInt(st.sval);
        }

Artur Biesiadowski
Offline K.I.L.E.R

Senior Devvie




Java games rock!


« Reply #15 - Posted 2006-05-03 13:55:30 »

I rewritten my code once again.
It parses the 3MB file instantly as far as I can tell. Smiley
I haven't even optimised it yet, not that I need to at this point.
I am still using regex however I have limited it's usage to purely gathering header data.

After I have refined my design(again) I will then write a parser that parsers 508MB model.
Performance is imperative but correct values come first, followed by a unit test. Smiley

Vorax:
Is there a name for a "redneck" programmer?

Jeff:
Unemployed. Wink
Offline cknoll

Junior Devvie




Flame On!


« Reply #16 - Posted 2006-05-03 18:17:41 »

For anyone interested, this topic was discussed at great length over at this thread:

http://www.javalobby.org/java/forums/m91813535.html#91813535
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

trollwarrior1 (12 views)
2014-11-22 12:13:56

xFryIx (67 views)
2014-11-13 12:34:49

digdugdiggy (46 views)
2014-11-12 21:11:50

digdugdiggy (41 views)
2014-11-12 21:10:15

digdugdiggy (35 views)
2014-11-12 21:09:33

kovacsa (59 views)
2014-11-07 19:57:14

TehJavaDev (61 views)
2014-11-03 22:04:50

BurntPizza (60 views)
2014-11-03 18:54:52

moogie (75 views)
2014-11-03 06:22:04

CopyableCougar4 (76 views)
2014-11-01 23:36:41
Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50

List of Learning Resources
by SilverTiger
2014-07-31 16:26:06
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!