Java-Gaming.org Hi !
Featured games (91)
games approved by the League of Dukes
Games in Showcase (804)
Games in Android Showcase (239)
games submitted by our members
Games in WIP (868)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  Java4K Sourcecode Compressor Community Effort (JSCCE)  (Read 14468 times)
0 Members and 1 Guest are viewing this topic.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Posted 2015-02-26 14:49:26 »

Here is a first lousy attempt at it kicking off the effort to create a community sourcecode compressor for Java Pointing

LousySourcecodeCompressor.java

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
import java.io.*;
import java.util.regex.*;

public class LousySourcecodeCompressor {
   public static void main(String[] args) throws Exception {
      File file = new File(args[0]);

      int origLength = 0;
      StringBuilder trimmed = new StringBuilder();
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
      while (true) {
         String line = br.readLine();
         if (line == null)
            break; // reached end of file

         origLength += line.length() + 1; // +1 for line-break
         line = line.trim(); // strip optional whitespace
         if (line.isEmpty())
            continue; // strip empty lines

         trimmed.append(line).append('\n');
      }
      String code = trimmed.toString();

      // make a lousy attempt at stripping comments
      code = Pattern.compile("//[a-zA-Z\\s]+$", Pattern.MULTILINE).matcher(code).replaceAll("");

      // make a lousy attempt at stripping annotations
      code = Pattern.compile("^@[a-zA-Z]+$", Pattern.MULTILINE).matcher(code).replaceAll("");

      // make a lousy attempt at stripping optional whitespace
      code = code.replaceAll("\\s*([\\+\\-\\*/%,\\(\\)\\{\\}\\[\\]=;:<>!])\\s*", "$1");

      System.out.println(code.length() + "/" + origLength);
      System.out.println();
      System.out.println(code);
   }
}

which takes OddEntry.java and produces the following (N.B.: newlines inserted by me)
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
853/1189

package net.indiespot.java4k.entries;import java.awt.*;import net.indiespot.java4k.Java4kRev1;public
class OddEntry extends Java4kRev1{public OddEntry(){name="Odd Entry";}public void render(Graphics2D
g){g.setColor(new Color(128,64,128));g.drawString("Drag the mouse a little...",8,20);g.setColor(new
Color(0,64,128));int b,r,a,c,q,t;for(r=150;r>=30;r-=15){t=((elapsed()+130_000)/((200-r)/25));b=r/3;
a=(w-r)/2;c=(h-r)/2;q=(int)(t%(r+b));g.drawLine(a+Math.max(0,q-b),c,a+Math.min(r,q),c);
q=(r+b)-(int)((t+r)%(r+b));g.drawLine(a,c+Math.max(0,q-b),a,c+Math.min(r,q));
q=(r+b)-(int)((t+r-b)%(r+b));g.drawLine(a+Math.max(0,q-b),c+r,a+Math.min(r,q),c+r);
q=(int)((t-r)%(r+b));g.drawLine(a+r,c+Math.max(0,q-b),a+r,c+Math.min(r,q));}if(mouse.dragArea
!=null){g.setColor(new Color(128,64,128));Rectangle w=mouse.dragArea;g.drawRect(w.x,w.y,w.width,w.height);}}}

Note that it does not yet respect string literals, which makes it borderline useless Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline princec

« JGO Spiffy Duke »


Medals: 1146
Projects: 3
Exp: 20 years


Eh? Who? What? ... Me?


« Reply #1 - Posted 2015-02-26 17:15:23 »

Another neato feature might be to automatically compress identifiers as well to their minimal representation eg. A, B, C, etc. but that'd require a bit more of a sophisticated parser...

Cas Smiley

Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #2 - Posted 2015-02-26 17:30:51 »

Cas: 'auto refactoring' is left as an exercise for the reader. Pointing



src → [javac → proguard → decompiler] → src → strip whitespace → persecutioncomplex → win!

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #3 - Posted 2015-02-26 21:13:32 »

I'm working on improving the minifier, most significant new feature so far is string literal preservation.
The only wrench in the gears with it currently is anything that looks like a string literal in a comment screws up other literals in the file, but I know how to fix it.

It also still uses Riven's crazy whitespace eliminator expression, I haven't toyed with that at all yet.
Test results: http://pastebin.java-gaming.org/54ebe1a232e1f

Thought about limited identifier compression, it'll be tough I think, at least for anything other than primitive types.
Any other features it should have?
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #4 - Posted 2015-02-26 21:37:37 »

It's not that hard actually. You only need a parser capable of finding two kinds of comments, string literals and char literals, using a simple state machine. You replace these ranges by placeholders, apply whatever transformation that would corrupt what you replaced, and then inject the literals back in. I just didn't feel like actually doing it... persecutioncomplex

So much to do, so little time.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #5 - Posted 2015-02-26 21:40:05 »

Placeholders is exactly what I did, but I got greedy and used all the same placeholder (so as to only have to confirm the file doesn't contain 1 sequence), but they'll have to be numbered.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #6 - Posted 2015-02-26 21:42:13 »

Numbered or cryptographically hashed, whichever utility code is around Pointing

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #7 - Posted 2015-02-26 23:37:42 »

After about an hour, I reached 890 too, without placeholders, just tokens Smiley

Here is my horrific state-machine / tokenizer:
http://pastebin.java-gaming.org/ebea33e2f2e10
(ironic how my PHP parser b0rks)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #8 - Posted 2015-02-27 00:52:35 »

After about an hour, I reached 890 too, without placeholders, just tokens Smiley

Here is my horrific state-machine / tokenizer:
http://pastebin.java-gaming.org/ebea33e2f2e10
(ironic how my PHP parser b0rks)


Nice. Currently both minifiers are tied at 927 for my current test case:
http://pastebin.java-gaming.org/bea3e4f2e2017

Code: http://pastebin.java-gaming.org/ea3ef5e20271d

I'd like to see if anyone can manage to break either of them!
I suspect mine would be flakier, maybe around some annotation edge cases...
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #9 - Posted 2015-02-27 00:55:22 »

It's relatively easy to put them through a stress-test. You simply minify the minifier, and see whether it produces a working version of itself again... The version I posted can't do it, fixing it now. (actually, going to bed...)

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #10 - Posted 2015-02-27 01:03:39 »

(I don't know if I'm/we're hijacking the thread yet, but...)

There was one problem (maybe you already found it?):

1  
2  
-key="#&"+++seed // error: invalid operation ++/--
+key="#&"+ ++seed


Post-increment takes precedence over pre-increment (in parsing even!). Whitespace eliminator should probably take that into account.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #11 - Posted 2015-02-27 01:05:11 »

No worries, I'll just split it off, tomorrow Smiley

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #12 - Posted 2015-02-27 01:43:55 »

1  
2  
3  
s = s.replaceAll("(\\G|([\\+|\\-]))\\s+(\\1)", "[$2,$3]");
s = s.replaceAll("\\s*([\\+\\-\\*/%,\\.\\(\\)\\{\\}\\[\\]=;:<>!&\\|\\^])\\s*", "$1");
s = s.replaceAll("\\[\\,\\]", "").replaceAll("\\[([\\+|\\-]?)\\,(\\1?)\\]", "$1 $2");

Input: -
abc ++ + 6 + ++ xyz -- - 7 - -- pqr -- + x ++ - 4

Output:-
abc++ +6+ ++xyz-- -7- --pqr--+x++-4


Minifier: http://pastebin.java-gaming.org/a3efe60272d19 persecutioncomplex
Minified minifier: http://pastebin.java-gaming.org/3efe0772d2910 (with free linefeeds @ column ~80)



Update 1
Input: -
abc ++ + ++ xyz -- - 7 - -- pqr -- + x ++ - 4

Output:-
abc++ +++xyz-- -7- --pqr--+x++-4
Emo
Required:
abc+++ ++xyz-- -7- --pqr--+x++-4
Emo

Fix 1
1  
2  
3  
4  
5  
s = new StringBuilder(s).reverse().toString();
s = s.replaceAll("(\\G|([\\+|\\-]))\\s+(\\1)", "[$2,$3]");
s = s.replaceAll("\\s*([\\+\\-\\*/%,\\.\\(\\)\\{\\}\\[\\]=;:<>!&\\|\\^])\\s*", "$1");
s = s.replaceAll("\\[\\,\\]", "").replaceAll("\\[([\\+|\\-]?)\\,(\\1?)\\]", "$1 $2");
s = new StringBuilder(s).reverse().toString();

Input: -
abc ++ + ++ xyz -- - 7 - -- pqr -- + x ++ - 4

Output:-
abc+++ ++xyz-- -7- --pqr--+x++-4
Grin

http://pastebin.java-gaming.org/efe078d292016 (hey it's 3:30AM, gimme a break!)



Update 2
Input: -
def -- - 5 - -- xyz

Output:-
def-- -5- --xyz
Emo
Optimal:-
def---5- --xyz
Emo

Fix 2
1  
2  
3  
4  
5  
6  
s = new StringBuilder(s).reverse().toString();
s = s.replaceAll("(\\G|([\\+|\\-]))\\s+(\\1)", "[$2,$3]");
s = s.replaceAll("\\s*([\\+\\-\\*/%,\\.\\(\\)\\{\\}\\[\\]=;:<>!&\\|\\^])\\s*", "$1");
s = s.replaceAll("\\[\\,\\]", "").replaceAll("\\[([\\+|\\-]?)\\,(\\1?)\\]", "$1 $2");
s = new StringBuilder(s).reverse().toString();
s = s.replaceAll("(\\w([\\+\\-])\\2) (\\2\\w)", "$1$3");


Somebody will one day be extremely happy with this optimization Stare

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #13 - Posted 2015-02-27 03:08:13 »

Nerd-sniping yourself, huh? Cheesy
Happens.



Also, I don't see
a ? b : c -> a?b:c
in there, I added the ? to my version.
Also &+ and |+, EDIT: although I guess that is handled by the double-sided replacement.

EDIT2: Eclipse says
abc ++ + 6 + ++ xyz -- - 7 - -- pqr -- + x ++ - 4
is bad:
abc ++ + 6 + ++ [xyz --] - 7 - -- [pqr --] + x ++ - 4 // invalid in []

So I don't think that is valid input.
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #14 - Posted 2015-02-27 06:04:40 »

I'm done for the night, but at least I'm leaving off at a good place:
http://pastebin.java-gaming.org/fe07d99202616

Current test case is both of our classes lumped in one file, mine runs both to ensure correctness of each.
The file will compress itself to 7804 7879 (EDIT: forgot extra stress tests) chars and run again with no differences either still compressed or after eclipse formatter expansion, so at least for everything tested here, it's sound.
http://pastebin.java-gaming.org/07d901636261a

Unfortunately yours isn't working? It compresses ~100 chars more, but they look to all be invalid deletions...
Offline Drenius
« Reply #15 - Posted 2015-02-27 20:02:17 »

Just here to point out the ovious and say that this is finally a thread that about 98% of the community have literally nothing to contribute to while 90% of the remaining 2% seem to not have the time needed to contribute to it.
Go on guys, continue writing your own legend.
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #16 - Posted 2015-02-27 20:50:30 »

Cleaned it up, added some things. Moving to Gist for easy revisions:
https://gist.github.com/BurntPizza/800b6b4322aaa2da1960

It's got a basic CLI now:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
$ java CoffeeGrinder
CoffeeGrinder: Java source code minifier
        by BurntPizza

Usage: [options] file

        -i      Print compression info
        -c      Only strip comments and annotations
        -w:n    Attempt line wrapping at n columns
                Use 0 for no wrapping. Default: 80


Bug reports welcome, although I expect it would break on input other than valid java source, and I'm not sure I care.
Offline Riven
Administrator

« JGO Overlord »


Medals: 1371
Projects: 4
Exp: 16 years


Hand over your head.


« Reply #17 - Posted 2015-02-27 20:53:28 »

I was swamped today, will be swamped the entire weekend and then will encounter some light swamping. I will join the party immediately afterwards.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings!
Offline BurntPizza

« JGO Bitwise Duke »


Medals: 486
Exp: 7 years



« Reply #18 - Posted 2015-02-28 06:12:41 »

I'm gonna try it: Identifier compression  persecutioncomplex

Main effort at the moment is declaration extraction, and preliminary results are promising:

Current pipeline is minification -> large scary 'broad phase' regex -> split by semicolons/newlines -> series of several filters
Result is a dump of things which have declarations in them:

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
15  
16  
17  
18  
19  
20  
21  
22  
23  
24  
25  
26  
27  
28  
29  
30  
31  
32  
33  
34  
35  
36  
37  
38  
39  
40  
41  
42  
43  
44  
45  
46  
47  
48  
49  
50  
51  
52  
53  
54  
55  
56  
57  
58  
59  
60  
61  
62  
63  
64  
65  
public class CoffeeGrinder
public static void main(String[]args)throws IOException
String path
int lineWrapping=80;
boolean aggressive=true,printInfo=false;
for(String s
catch(NumberFormatException e
StringBuilder preprocessed=new StringBuilder();
for(String line
String code
int originalLength
private static void printUsage
private static String minify(String src,int lineWrap,boolean aggressive
PreservationResult pr=preserveStringLiterals(src);
String code
private static String lineWrap(String text,int width
StringBuilder lineWrapped=new StringBuilder();
StringBuilder sb=new StringBuilder
for(int i
String line
private static PreservationResult preserveStringLiterals(String in
Deque<Interval>intervals=new ArrayDeque<>();
Map<String,String>mapping=new HashMap<>();
int seed=0;
String key;
String prefix
boolean strmode=false,charmode=false,linecomment=false,blockcomment=false,escaped=false;
for(int i
char c
boolean inComment
StringBuilder sb=new StringBuilder
Interval i
PreservationResult pr=new PreservationResult
private static void compressIdentifiers(String text
for(Interval i
private static Set<String>identifiers(String text,Interval scope
Set<String>idens=new HashSet<>();
Matcher m
StringBuilder sb=new StringBuilder
String s
List<String>decs=new ArrayList<>();
for(String s
for(String s
private static void filter(List<String>list,Pattern p,boolean allMatch
for(int i
Matcher m
private static List<Interval>matchNestedIntervals(String text,char begin,char end
List<Interval>topLevels=new ArrayList<>();
int idx
int start=idx;
int nestLevel
char c
private static class PreservationResult
String output,key;
String revert(String text
Matcher m
Deque<Interval>intervals=new ArrayDeque<>();
Deque<String>matches=new ArrayDeque
StringBuilder sb=new StringBuilder
Interval i
private static class Interval
int start,end;
Interval(int x,int y
String subString(String in
public String toString


It's barely tested, but I do believe that is every declaration of an identifier in the file, and no false entries.
Of course I'll need to process much more source to see how it holds up. (It won't)
Pages: [1]
  ignore  |  Print  
 
 

 
Riven (581 views)
2019-09-04 15:33:17

hadezbladez (5510 views)
2018-11-16 13:46:03

hadezbladez (2402 views)
2018-11-16 13:41:33

hadezbladez (5772 views)
2018-11-16 13:35:35

hadezbladez (1223 views)
2018-11-16 13:32:03

EgonOlsen (4661 views)
2018-06-10 19:43:48

EgonOlsen (5682 views)
2018-06-10 19:43:44

EgonOlsen (3198 views)
2018-06-10 19:43:20

DesertCoockie (4095 views)
2018-05-13 18:23:11

nelsongames (5115 views)
2018-04-24 18:15:36
A NON-ideal modular configuration for Eclipse with JavaFX
by philfrei
2019-12-19 19:35:12

Java Gaming Resources
by philfrei
2019-05-14 16:15:13

Deployment and Packaging
by philfrei
2019-05-08 15:15:36

Deployment and Packaging
by philfrei
2019-05-08 15:13:34

Deployment and Packaging
by philfrei
2019-02-17 20:25:53

Deployment and Packaging
by mudlee
2018-08-22 18:09:50

Java Gaming Resources
by gouessej
2018-08-22 08:19:41

Deployment and Packaging
by gouessej
2018-08-22 08:04:08
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!