Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
on:
2006-09-30 13:27:30 » |
|
So I started writing my own java assembler for this years competition. Sure, there already are java assemblers out there, but where's the fun in that? Here's my first working hello world application: 1 2 3 4 5 6 7 8 9
| CLASS Test java/lang/Object METHOD main ([Ljava/lang/String;)V getstatic FIELD java/lang/System out Ljava/io/PrintStream; ldc "Hello" invokevirtual METHOD java/io/PrintStream println (Ljava/lang/String;)V bipush 10 invokestatic METHOD java/lang/System exit (I)V return END |
When assembling that, the constant pool gets filled with 25(!) entries: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| ### Compiling with Chompiler ### Constant pool contains 25 entries 1 Utf8Info -> "Test" 2 ClassInfo -> 1 3 Utf8Info -> "java/lang/Object" 4 ClassInfo -> 3 5 Utf8Info -> "java/lang/System" 6 ClassInfo -> 5 7 Utf8Info -> "out" 8 Utf8Info -> "Ljava/io/PrintStream;" 9 NameAndTypeInfo -> 7, 8 10 FieldrefInfo -> 6, 9 11 Utf8Info -> "Hello" 12 StringInfo -> 11 13 Utf8Info -> "java/io/PrintStream" 14 ClassInfo -> 13 15 Utf8Info -> "println" 16 Utf8Info -> "(Ljava/lang/String;)V" 17 NameAndTypeInfo -> 15, 16 18 MethodrefInfo -> 14, 17 19 Utf8Info -> "exit" 20 Utf8Info -> "(I)V" 21 NameAndTypeInfo -> 19, 20 22 MethodrefInfo -> 6, 21 23 Utf8Info -> "main" 24 Utf8Info -> "([Ljava/lang/String;)V" 25 Utf8Info -> "Code" |
The resulting class file is 301 bytes, of which only 14 bytes is actual bytecode. This means that the largest part (by far) of the class file is constant pool data. The "Code" Utf8 entry is a mandatory field for the bytecode in a method_field entry. This means that naming your class "Code" instead of "A" will actually take up LESS space, as you avoid an extra entry to the constant pool. It's probably also a good idea to name one of your fields (method or variable) "Code" as well. Of course, since most 4k games have a main method, you could use that name as well. Another annoying thing is the duplication of "java/io/Printstream". I wish the name of a class was in the same format as for field descriptors, as that would save having to duplicate the text in the constant pool. This happens every time you call a method on a member of any class.. first it uses the field descriptor to look up the field, then the class name to invoke the method.. the difference between the two being an L in the start and an ; in the end.
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #1 on:
2006-09-30 14:45:43 » |
|
Ahh, it's paying off! Java code: 1 2 3 4 5 6 7
| public static void main(String args[]) { Code code = new Code(); code.setSize(100, 100); code.setVisible(true); code.setTitle("Code"); } |
Code generated by javac: (28 bytes) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| new CLASS Code dup invokespecial METHOD Code <init> ()V astore_1 aload_1 bipush 100 bipush 100 invokevirtual METHOD Code setSize (II)V aload_1 iconst_1 invokevirtual METHOD Code setVisible (Z)V aload_1 ldc "Code" invokevirtual METHOD Code setTitle (Ljava/lang/String;)V return |
Hand written bytecode that does the same: (24 bytes) 1 2 3 4 5 6 7 8 9 10 11 12
| new CLASS Code dup dup2 invokespecial METHOD Code <init> ()V bipush 100 dup invokespecial METHOD Code setSize (II)V iconst_1 invokespecial METHOD Code setVisible (Z)V ldc "Code" invokespecial METHOD Code setTitle (Ljava/lang/String;)V return |
Four bytes saved! That's 14% less! (If you only count the bytecode. The actual class file only shunk 1.3%., from 311 to 307 bytes, but that's from the massive constant pool overhead)
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #2 on:
2006-09-30 18:19:21 » |
|
Argh, the stack size must be the same at every point in the bytecode, meaning you can't inline recursion.  There goes that nifty binary data inlining optimisation.  (Was planning on just pushing all the data onto the stack, the iterating over it until the stack is empty)
|
|
|
|
Games published by our own members! Go get 'em!
|
|
Abuse
JGO Kernel      Posts: 1866 Medals: 5
falling into the abyss of reality
|
 |
«
Reply #3 on:
2006-09-30 18:42:52 » |
|
The optimisations you are performing in the previous example do not need to be done by hand; you can write a generic bytecode optimiser that would be able to automatically perform such an optimisation. I believe reverting to writing your code in assembler is the wrong direction to go. Keep writing in Java, and simply write additional bytecode optimisation tools to automatically perform all the optimisations you would otherwise perform by hand. The solutions are then infinitely re-usable - and also allows you to maintain readability of your source code. Argh, the stack size must be the same at every point in the bytecode, meaning you can't inline recursion.  There goes that nifty binary data inlining optimisation.  (Was planning on just pushing all the data onto the stack, the iterating over it until the stack is empty) " the stack size must be the same at every point in the bytecode," Could you explain what exactly you mean by this? Do you intend to inline methods by using the JSR (jump to sub rountine) instruction? (usually used for try/catch/finally blocks - but can be [ab]used for other purposes) Or are you referring to the JLS requirement, that the stack pointer be at zero at the point a method returns? (either normally, or via Exception) If it is the latter; you maybe interested to know that this JLS requirement is not strictly enforced; I havn't yet found a VM that barfs if you return from a method without correctly emptying the stack. Ergo, it is ok to push lots of data onto the stack, and then progressively pop it off via repeated method invocations. The note on the "Code" utf8 constant pool entry being mandatory is interesting, but ultimately not particularly useful in the optimal case.
|
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #4 on:
2006-09-30 19:14:37 » |
|
Could you explain what exactly you mean by this? 1 2 3 4 5 6 7
| 1 iconst_0 2 iconst_1 3 iconst_2 4 iconst_3 LOOP: 5 nop 6 ifne LOOP |
The first time the jvm hits the nop, the stack is size 4. The second time it hits it, it's size 3. This is illegal, and causes the following exception at runtime: Exception in thread "main" java.lang.VerifyError: (class: Code, method: main signature: ([Ljava/lang/String;)V) Inconsistent stack height 4 != 3 This seems to only be mentioned briefly towards the end of 4.9.2 in the JVM specification The note on the "Code" utf8 constant pool entry being mandatory is interesting, but ultimately not particularly useful in the optimal case. I disagree. In the optimal case, saving those few bytes is crucial. A LOT of time is spent trying to trim a jar down from 4100 to 4096 bytes in this competiton. 
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #5 on:
2006-09-30 19:20:51 » |
|
As for making a tool out of it;
It's impossible to make a tool that optimises java code into bytecode as tight as a human can write it, as it's possible to do things in bytecode that you can't do in java.
|
|
|
|
appel
JGO Wizard     Posts: 1477 Medals: 23
I always win!
|
 |
«
Reply #6 on:
2006-09-30 21:34:28 » |
|
Markus, you're nuts  each year the 4k competition goes a little bit further... into madness 
|
|
|
|
Abuse
JGO Kernel      Posts: 1866 Medals: 5
falling into the abyss of reality
|
 |
«
Reply #7 on:
2006-09-30 22:22:47 » |
|
Could you explain what exactly you mean by this? 1 2 3 4 5 6 7
| 1 iconst_0 2 iconst_1 3 iconst_2 4 iconst_3 LOOP: 5 nop 6 ifne LOOP |
The first time the jvm hits the nop, the stack is size 4. The second time it hits it, it's size 3. This is illegal, and causes the following exception at runtime: Exception in thread "main" java.lang.VerifyError: (class: Code, method: main signature: ([Ljava/lang/String;)V) Inconsistent stack height 4 != 3 Ah, I see what you mean  Branching before the condition, but after the stack push breaks the golden guarantee that there won't ever be more stack pops than pushes. While its a purely academic question, as we have no way of turning off the byte code verifier without special user actions (which I presume is disallowed for the competition). Have you tried running the VM with -noverify? The note on the "Code" utf8 constant pool entry being mandatory is interesting, but ultimately not particularly useful in the optimal case. I disagree. In the optimal case, saving those few bytes is crucial. A LOT of time is spent trying to trim a jar down from 4100 to 4096 bytes in this competiton.  What would you intend on using the name "Code" for? Using it for the class name is a bad idea - as a jar's list of contents contains all the files names *uncompressed*, so while you might remove 2~4 bytes from the constants pool - you will be re-adding them in the jar's file contents list. I find the best strategy with regard to keeping the constants pool as small as possible, is to choose your api usage wisely. Methods that take object parameters should be avoided like the plague - as the parameter types are all stored as fully qualified class names. Another exceedingly effective trick, is to re-order the constants pool so it compresses better in the jar. Huge savings can be obtained doing this.
|
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #8 on:
2006-10-01 06:46:08 » |
|
Good point about the name of the file being uncompressed in the jar! By using an existing UTF8_info instead of creating a new one, you save 1+2+(length of string in bytes) bytes of class file data. Meaning using "Code" instead of "A" saves you four bytes in the bytecode, but adds three bytes to the jar file. This will probably not give you a net gain, and at most you'll only gain a single byte.
As for methods, avoiding methods that take objects is far more important than avoiding methods that take primitives. the method "void setColor(Color)" becomes two utf8_infos; "setColor" and "(Ljava/awt/Color;)V" "void setColor(byte r, byte g, byte b)" becomes "setColor" and "(BBB)V" 13 bytes less, and it saves you from having to create a new Color object (which requires an entry for "java/awt/Color" in the constant pool, which is another 17 bytes saved)
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #9 on:
2006-10-01 06:50:06 » |
|
Another exceedingly effective trick, is to re-order the constants pool so it compresses better in the jar. Huge savings can be obtained doing this. Yes. :-) Also, if you make sure all constants are in the first 256 entries, you can avoid using "wide *load", for a minor save of three bytes of bytecode every time you load that constant. Of course, doing this requires a recompilation of the bytecode.
|
|
|
|
Games published by our own members! Go get 'em!
|
|
Son Of Cain
Jr. Member   Posts: 72
|
 |
«
Reply #10 on:
2006-10-01 10:30:37 » |
|
Jesus Christ, man you're mad  I can only get this deep into byte-madness when I'm completely drunk  That's when the mathematician in me awakens (for a very short period of time =)
|
|
|
|
|
nonnus29
JGO Ninja    Posts: 687
Giving Java a second chance after ludumdare fiasco
|
 |
«
Reply #11 on:
2006-10-02 22:21:45 » |
|
Jesus Christ, man you're mad  I can only get this deep into byte-madness when I'm completely drunk  That's when the mathematician in me awakens (for a very short period of time =) Markus is not of this world. I figured that out some time ago....... 
|
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #12 on:
2006-10-03 03:49:59 » |
|
I'm just nerdy. :-D I found out how much work it is to write the entire game in assembler, and I'm kinda starting to change my mind.  Just making a method calls requires FOUR entries in the constant pool, the right parameters individually pushed onto the stack, with a reference to the object before them all. Sure, you can dup the object reference a few times to save space, but then you end up with really complex assembler, so every time you change something there, you might end up breaking the entire thing. .. but it's fun! 
|
|
|
|
Abuse
JGO Kernel      Posts: 1866 Medals: 5
falling into the abyss of reality
|
 |
«
Reply #13 on:
2006-10-03 05:51:30 » |
|
I'm just nerdy. :-D I found out how much work it is to write the entire game in assembler, and I'm kinda starting to change my mind.  Just making a method calls requires FOUR entries in the constant pool, the right parameters individually pushed onto the stack, with a reference to the object before them all. Sure, you can dup the object reference a few times to save space, but then you end up with really complex assembler, so every time you change something there, you might end up breaking the entire thing. .. but it's fun!  That's the same conclusion I came to when I looked into using jasmin 2 years ago =) Concocting optimisation patches in BCEL however, is alot more feasible - and once written, can be reused by anyone, on any application - a much more efficient application of time! They also have real-world applications ( & value!) in the field of J2ME development. (the only restriction being the absence of the JSR instruction in CLDC - which is a *huge* frustration)
|
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #14 on:
2006-10-03 07:14:12 » |
|
I found out how much work it is to write the entire game in assembler, and I'm kinda starting to change my mind.  In case it wasn't clear, the main purpose of this exercise was to 1) Have fun 2) Learn exactly how java bytecode works to be able to optimise it In that order. 
|
|
|
|
nonnus29
JGO Ninja    Posts: 687
Giving Java a second chance after ludumdare fiasco
|
 |
«
Reply #15 on:
2006-10-03 21:27:51 » |
|
I'm just nerdy. :-D Okay, but that hexagon-grid-logic-cell thingy was inspired to say the least.  Do you have a link to that? edit; never mind, I found it. 
|
|
|
|
|
|
|
Abuse
JGO Kernel      Posts: 1866 Medals: 5
falling into the abyss of reality
|
 |
«
Reply #17 on:
2006-10-13 06:05:29 » |
|
Have you had a look @ ASM ? ( http://asm.objectweb.org/index.html ) Its ability to take a class file, and generate the ASM javacode that would build said class file looks like an interesting starting point for hand-optimisation. (once your game is feature complete). It also appears to have a funky eclipse plugin  The bytecode outline plugin ( http://andrei.gmxhome.de/bytecode/index.html ) also appears to be quite useful when you want to see exactly where all your bytecodes are going. (though it only operates on the eclipse compilers output - so won't be very useful when optimisers are brought into the process)
|
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #18 on:
2006-10-14 06:26:58 » |
|
I haven't seen it, no. =) I did, however, know about Jasmin, but still decided to write my own assembler to really understand what's going on in the jvm. =)
|
|
|
|
blahblahblahh
JGO Kernel      Posts: 4575
http://t-machine.org
|
 |
«
Reply #19 on:
2006-10-15 18:05:54 » |
|
Why does this sodding topic keep showing up as permanently unread on the main index for me?
[/offtopic]
|
malloc will be first against the wall when the revolution comes...
|
|
|
kevglass
« League of Dukes » JGO Kernel      Posts: 5214 Medals: 49
Mentally unstable, best avoided.
|
 |
«
Reply #20 on:
2006-10-15 18:51:45 » |
|
Because the forum pixies are out to get you  Kev [/even-more-off-topic-how-far-can-we-go]
|
|
|
|
Markus_Persson
JGO Kernel      Posts: 2092 Medals: 10
Mojang Specifications
|
 |
«
Reply #21 on:
2006-10-16 05:03:27 » |
|
I like you, blahblahblah. You're random.
|
|
|
|
|