Java-Gaming.org Java4K winners: [ by our judges | by the community ]         
Featured games (67)
games approved by the League of Dukes
Games in Showcase (∞)
games submitted by our members



News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  Print  
  Default XML parser grabs DTD from w3.org everytime  (Read 722 times)
0 Members and 2 Guests are viewing this topic.
Offline CommanderKeith

JGO Wizard
****

Posts: 1455
Medals: 9



« on: 2012-01-05 14:25:04 »

Hey have you guys come across this problem? It's where the default java XML parser grabs the xml file's DTD from W3.org every single time it runs. I've spent the last 3 days trying to figure out why my app takes so long to load and it's because of this problem. Unbeknownst to me my app was actually getting the DTD from w3c's site each time, which caused a delay of about 30 seconds... Geez that's frustrating!

And smart people are having this problem too:
http://weblogs.java.net/blog/cayhorstmann/archive/2011/12/12/sordid-tale-xml-catalogs

So if i leave out this line from the XML file:

1  
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">


then the javax.xml.parsers.SAXParser parses the file straight away.

But if I do that then the proper DTD is not used, so the proper solution is to setup the SAX parser like this (http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/#comment-376):
1  
2  
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

That line is not documented or mentioned anywhere on oracle.com except in the 376th comment on that w3.org blog post... gah!

Apparently W3 serve up 100 million dtd downloads/day, and the w3 guy says in the comments that 1/4 of these are from java apps:
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/#comment-359

I couldn't believe how silly this problem is so I felt the need to air my frustration  Tongue

Online Riven
« League of Dukes »

JGO Kernel
*****

Posts: 5866
Medals: 255


Hand over your head.


« Reply #1 on: 2012-01-05 14:34:01 »

It's indeed one of the biggest design flaws ever. You can bring down tens of thousands of applications by attacking this single point of failure.

Hi, appreciate more people! Σ ♥ = ¾

Learn how to award medals... and work your way up the social rankings
Offline CommanderKeith

JGO Wizard
****

Posts: 1455
Medals: 9



« Reply #2 on: 2012-01-05 14:41:02 »

It's bizarre, i would never have guessed that a simple XML parse would hook my app up to some random website.

How did you learn about the flaw?

It doesn't seem like a well-known problem. I googled 'SAX pause', 'xml delay', 'xml SAX stall', and many variations but couldn't find anything which indicated that this was my problem.

That w3 blog post was sometimes in the hits, but of course I never read so far down in the comments to see the solution.

Games published by our own members! Go get 'em!
Online Riven
« League of Dukes »

JGO Kernel
*****

Posts: 5866
Medals: 255


Hand over your head.


« Reply #3 on: 2012-01-05 14:46:39 »

How did you learn about the flaw?
Coincidence, I just stumbled upon a webpage about it a few years ago.

Hi, appreciate more people! Σ ♥ = ¾

Learn how to award medals... and work your way up the social rankings
Offline pjt33

JGO Strike Force
***

Posts: 913
Medals: 17



« Reply #4 on: 2012-01-05 17:16:25 »

I haven't come across this in Java, because I use an XML parser which doesn't check against the DTD, but I have come across it in .Net. I solved it there by downloading the DTDs and then hacking the XML files before putting them through the parser.
Offline aazimon

Full Member
**

Posts: 208
Medals: 5



« Reply #5 on: 2012-01-05 18:34:09 »

Try using Dom4j instead.
Offline CommanderKeith

JGO Wizard
****

Posts: 1455
Medals: 9



« Reply #6 on: 2012-01-06 20:33:00 »

I haven't come across this in Java, because I use an XML parser which doesn't check against the DTD, but I have come across it in .Net. I solved it there by downloading the DTDs and then hacking the XML files before putting them through the parser.
I tried using an xml file without the dtd doctype declaration but then the special entities like non breaking space &nbsp; would throw errors.

Offline CommanderKeith

JGO Wizard
****

Posts: 1455
Medals: 9



« Reply #7 on: 2012-01-07 03:20:15 »

So i switched from using SAX to DOM and ran into similar troubles. I found this project which has worked well:

http://code.google.com/p/java-xhtml-cache-dtds-entityresolver/

Offline pjt33

JGO Strike Force
***

Posts: 913
Medals: 17



« Reply #8 on: 2012-01-07 04:20:06 »

I haven't come across this in Java, because I use an XML parser which doesn't check against the DTD, but I have come across it in .Net. I solved it there by downloading the DTDs and then hacking the XML files before putting them through the parser.
I tried using an xml file without the dtd doctype declaration but then the special entities like non breaking space &nbsp; would throw errors.
Yes, that's why I mentioned downloading the DTDs. The hack was to remove the public DTD references and replace them with system ones.
Offline CommanderKeith

JGO Wizard
****

Posts: 1455
Medals: 9



« Reply #9 on: 2012-01-07 11:08:46 »

Ah i see. It's so bizarre that this is not done by default in the java xml libraries, and that tutorials do not show how to do it.


Pages: [1]
  Print  
 
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.16 | SMF © 2011, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.107 seconds with 19 queries.