I need two things for JGF at the moment, both to support online editing of HTML source code.
The easy one is "something to edit the HTML source embedded in a webpage". I can do a search for html applet editors (easy) - although I'd appreciate any suggestions for particularly good ones if you know of any (there's an awful lot out there).
The hard one is "something to post-process the HTML source and strip everything bad and convert plain text which has been manually formatted into HTML". The main problem is ... I have no idea what google terms to search for. Everything I've tried drew blanks
. The main things I can think of that it needs to do are:
- strip all nasty stuff, basically all scripting etc (even "cunningly disguised scripting" that is non-obvious)
- detect "blank lines" and reformat the preceding and postceding text into two paragraphs by wrapping them with P tags
- detect "probable website URL's" and replace with A HREF links
All this stuff is commonly done by forum software and many other things, but I've never seen / heard of a particular open-source java library for it (surely there must be one, somewhere??)