Java-Gaming.org Hi !
Featured games (83)
games approved by the League of Dukes
Games in Showcase (538)
Games in Android Showcase (132)
games submitted by our members
Games in WIP (600)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  XML Parsing - The new way  (Read 2333 times)
0 Members and 1 Guest are viewing this topic.
Offline Marvin Fröhlich

Senior Devvie




May the 4th, be with you...


« Posted 2010-10-20 19:10:30 »

Dear Community,

many of you will certainly have to deal with XML parsing here and there. Basically there are three ways of XML parsing in Java. The DOM approach, a SAX parser and byte code manipulation approaches like JIBX and such.

I definitely don't like the JIBX way. So it's out for me. JDOM is nice for some smaller XMLs, but keeps the nature of a quick'n'dirty solution for me, since it is extremely memory consuming and pushes everything to the memory and puts it into lists, etc. even the parts, that I don't need. And then accessing a child element is not even done in O(1), but O(n), since not only the names, but all the namespaces have to be compared. XML namespaces are the most useless thing in the XML world anyway. Though there will be opposing opinions.

I like the SAX parser approach. But there are two disadvantages.

1. Initializing the parser takes a lot of lines of code.
2. In the startElement() methods, etc. I have to know, where I am in the XML hierarchy to decide, what to do with a certain element.

I have written some code, that drastically simplifies the whole process. Have a look at the code here.

How does it work? Let's take a look.

Disadvantage 1 is addressed by provinding a SimpleXMLParser class, that selects a certain SAXParser (part of the JRE) and initializes it. Of course this restricts you to a single parser implementation. But hey, why do we need more, if one works just fine?

Now for disadvantage 2.

Let's say, your XML looks like this (omitting the header).
###################################
<root>
    <pats>
        <dogs>
            <dog name="Paul" />
            <dog name="Justine" />
            <dog name="Jack" />
            <dog name="Terry" />
        </dogs>
       
        <cats>
            <cat name="Muschi" />
            <cat name="Pussy" />
        </cats>
    </pats>
</root>
###################################

So to parse only the dogs out of this data, you have to write an XML handler, that checks in the startElement() method, if the current Element is a "dog" element AND it is parented by a "dogs" element AND this is in a "pats" element AND this is in a "root" element, which IS actually a root element. Ok, these checks have to be done in any case. But we can reduce the number and costs of these checks and we can reduce the necessary knowledge of the parser, that only wants to get the dogs from the XML.

So you would implement a SimpleXMLHandlerDelegate. The onElementStarted() method would look like this:
###################################
@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
    // Notice, that we're querying for level 0 here!
    if ( ( path.getLevel() == 0 ) && name.equals( "dog" ) ) // This could even be skipped, if you have designed the XML yourself and know for sure, that only dog elements are in here.
    {
        System.out.println( "Found a dog called \"" + attributes.getValue( "name" ) + "\"." );
    }
}
###################################

This is everything, the dogs parser needs to do and know.

Now we need a parent handler, that navigates to the dogs and then delegates to our dogs handler. This would be a SimpleXMLHandler implementation with the onElementStarted() method as follows.
###################################
@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
    if ( path.isAt( false, "root", "pats" ) && name.equals( "dogs" ) )
    {
        delegate( dogsHandler );
    }
}
###################################

Isn't this simple? We could also tune the code a little bit to ged rid of some String compares. But this needs a little more code, but it's worth it. All you have to do is overriding the getPathObject() method in our root handler as follows.
###################################
private static enum RootElements
{
    root;
}

private static enum Level1Elements
{
    pats;
}

private static enum Level2Elements
{
    dogs,
    cats,
    ;
}

@Override
protected Object getPathObject( XMLPath path, String element )
{
    if ( path.getLevel() == 0 )
    {
        try
        {
            return ( RootElements.valueOf( element ) );
        }
        catch ( Throwable t )
        {
            return ( new Object() );
        }
    }
    else if ( path.isAtByObjects( false, RootElements.root ) )
    {
        try
        {
            return ( Level1Elements.valueOf( element ) );
        }
        catch ( Throwable t )
        {
            return ( new Object() );
        }
    }
    else if ( path.isAtByObjects( false, RootElements.root, Level1Elements.pats ) )
    {
        try
        {
            return ( Level2Elements.valueOf( element ) );
        }
        catch ( Throwable t )
        {
            return ( new Object() );
        }
    }
}

@Override
protected void onElementStarted( XMLPath path, String name, Object object, Attributes attributes ) throws SAXException
{
    if ( object == Level2Elements.dogs ) // Simplified and cheaper test
    {
        delegate( dogsHandler );
    }
}
###################################


There's also a SimpleXMLWriter, that encapsulates an inverse SAX parser and lets you add elements and data in a very easy way, by simply calling the writeElement() method.

On a side note there's also a very powerful ini file parser and writer in JAGaToo. If you're interested, have a look here.


What do you think? Please add comments and critics.

Marvin
Offline Orangy Tang

JGO Kernel


Medals: 56
Projects: 11


Monkey for a head


« Reply #1 - Posted 2010-10-20 21:22:05 »

Unless I'm reading you wrong, you have to hard-code the depth of the elements at which you expect them?

Personally I think part of the power of xml is having xml fragments with common handling appear at various points (and depths) within an xml tree. How does your api deal with this?

It also seems that this hardcoding would add an extra maintenance burden and make things more fragile. It's a neat idea though, for certain kinds of xml it would probably simplify things quite a bit.

[ TriangularPixels.com - Play Growth Spurt, Rescue Squad and Snowman Village ] [ Rebirth - game resource library ]
Offline Marvin Fröhlich

Senior Devvie




May the 4th, be with you...


« Reply #2 - Posted 2010-10-20 21:33:53 »

Unless I'm reading you wrong, you have to hard-code the depth of the elements at which you expect them?

Yes of course. It's the same with a DOM approach and JIBX should be even more hardcoded.

Of course with JDOM you can navigate to a certain subtree, get the element and then delegate further processing to some code, that doesn't know about the element's parents.
Now with my solution you navigate the the known subtree and delegate to the next handler, which doesn't need to know and cannot know anything about where the parent handler was. That's the overall point here Wink.

Personally I think part of the power of xml is having xml fragments with common handling appear at various points (and depths) within an xml tree. How does your api deal with this?

Through the deletate handlers as described in the initial posting.

It also seems that this hardcoding would add an extra maintenance burden and make things more fragile. It's a neat idea though, for certain kinds of xml it would probably simplify things quite a bit.

Well, you can always scan the whole XML data even through my API and identify an element only by it's name or maybe one parent. It's up to you. The clue in my API is, that it provides the current XML path out of the box, which you would have to code by yourself when using plain SAX. And it provides mechanisms to tue the performance (element objects, see above) and especially the delegate handlers.

Marvin
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline i30817

Junior Devvie





« Reply #3 - Posted 2010-10-20 23:53:47 »

You do know about STAX don't you?

Instead of a callback handler, you control the iteration, result:
much simpler code when the data you want to combine is spread over many subtags.

It's not in memory either.
Offline JL235

JGO Coder


Medals: 10



« Reply #4 - Posted 2010-10-25 07:52:07 »

Maybe I'm missing something, but to be honest I see your solution needing pages of code for navigating a 14 line XML file.

For getting all 'dog' nodes within 'dogs' I'd much rather write something like:
1  
2  
3  
4  
5  
6  
7  
8  
List<String> myDogs = new ArrayList<String>();
XML parser = new XML( myXMLFile );

parser.map( "* dogs dog", new XMLMatcher() {
    public void onMatch( XML node ) {
        myDogs.add( node.getAttribute("name") );
    }
};

Note that is just pseudo code. The string describes what node I am after (the "* dogs dog"), the XMLMatcher holds the code for what I want to do and your library is left to parse the XML in any way it wants.

Offline Mr. Gol

Senior Devvie


Medals: 1



« Reply #5 - Posted 2010-10-25 08:56:13 »

There must be at least 20 ways of parsing XML Smiley

Has anyone ever used XPath? I know it's supposed to be a query language for XML, but I've never seen it used anywhere in production, despite it being supported by nearly all programming languages' standard libraries.
Offline markus.borbely

Junior Devvie





« Reply #6 - Posted 2010-10-25 11:36:55 »

There must be at least 20 ways of parsing XML Smiley

Has anyone ever used XPath? I know it's supposed to be a query language for XML, but I've never seen it used anywhere in production, despite it being supported by nearly all programming languages' standard libraries.

Yes, if you have to find that one node/list/attribute etc... it's for you. Instead of parsing the xml and storing all data in your own structure, you can just query the dom for that tiny bit of information you need.
Offline deepthought
« Reply #7 - Posted 2010-11-04 22:41:07 »

personally i would recommend XOM. it works great!

jocks rule the highschools. GEEKS RULE THE WORLD MWAHAHAHA!!
captain failure test game
Offline Nate

« JGO Bitwise Duke »


Medals: 158
Projects: 4
Exp: 14 years


Esoteric Software


« Reply #8 - Posted 2010-11-05 03:45:13 »

+1 for XOM (if forced to use XML).

Offline cylab

JGO Ninja


Medals: 55



« Reply #9 - Posted 2010-11-05 11:04:50 »

Has anyone ever used XPath? I know it's supposed to be a query language for XML, but I've never seen it used anywhere in production, despite it being supported by nearly all programming languages' standard libraries.

You don't see much xml processing applications, do you? ;-P

I used XPath a lot and since XPath is the main query language in XSL, everyone else does.

Mathias - I Know What [you] Did Last Summer!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline i30817

Junior Devvie





« Reply #10 - Posted 2010-11-06 13:09:40 »

I'm using apache digester now.

It is much nicer than doing it manually though there are some gotchas
1) if you have some tag structure that is substring of another tag structure and you put in two listeners, one to each, say:
book/author
and
book/author/pseudonym

The first callback will be called with a empty string even if you're reading the second type at the time.
2) something trippy involving call order in a specific type of callback (relating to the stack design)
3) stupid function names. I mean bad, though this is a fault of apache generally, for some reason, i've found.


The library allows javabeans binding.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

rwatson462 (29 views)
2014-12-15 09:26:44

Mr.CodeIt (20 views)
2014-12-14 19:50:38

BurntPizza (40 views)
2014-12-09 22:41:13

BurntPizza (75 views)
2014-12-08 04:46:31

JscottyBieshaar (37 views)
2014-12-05 12:39:02

SHC (50 views)
2014-12-03 16:27:13

CopyableCougar4 (47 views)
2014-11-29 21:32:03

toopeicgaming1999 (113 views)
2014-11-26 15:22:04

toopeicgaming1999 (100 views)
2014-11-26 15:20:36

toopeicgaming1999 (30 views)
2014-11-26 15:20:08
Resources for WIP games
by kpars
2014-12-18 10:26:14

Understanding relations between setOrigin, setScale and setPosition in libGdx
by mbabuskov
2014-10-09 22:35:00

Definite guide to supporting multiple device resolutions on Android (2014)
by mbabuskov
2014-10-02 22:36:02

List of Learning Resources
by Longor1996
2014-08-16 10:40:00

List of Learning Resources
by SilverTiger
2014-08-05 19:33:27

Resources for WIP games
by CogWheelz
2014-08-01 16:20:17

Resources for WIP games
by CogWheelz
2014-08-01 16:19:50

List of Learning Resources
by SilverTiger
2014-07-31 16:29:50
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!