Java-Gaming.org    
Featured games (79)
games approved by the League of Dukes
Games in Showcase (475)
Games in Android Showcase (106)
games submitted by our members
Games in WIP (530)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: 1 [2]
  ignore  |  Print  
  Best way of embedding binary data in source?  (Read 9284 times)
0 Members and 1 Guest are viewing this topic.
Offline pjt33
« Reply #30 - Posted 2009-12-23 11:35:28 »

I don't think the "encoding" outside byte packing changes the bytes. So UTF-8 does the 1, 2 or 3 byte thing with restrictions on what can be encoded in each set as stated above. The encoding is more what char gets decoded to what "letter"? as i thought. For one i have never heard of MacRoman. That sounds like a char->font thing.

Also i have done this over the network to other machines and had no problems. But some machines may default to UTF-16 or something so i should use the versions that specify encoding.
You're getting encoding and charset the wrong way round. (It doesn't help that the Java class names are wrong!)

The charset (mapping between values of the char datatype and "letters") used by Java is Unicode. The String methods which convert to and from bytes and don't take a Charset (that is, encoding) argument use the default encoding, which varies by platform and locale.

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
import java.nio.charset.Charset;

public class EncDemo {
        public static void main(String[] args) {
                System.out.println(Charset.defaultCharset());
                byte[] bytes = "\u00fe".getBytes();
                for (byte b : bytes) {
                        System.out.print(Integer.toHexString((b >> 4) & 0xf));
                        System.out.print(Integer.toHexString(b & 0xf));
                        System.out.print(" ");
                }
                System.out.println();
        }
}


A quick test on an OS X (10.5.Cool box gives:
1  
2  
3  
$ java EncDemo
MacRoman
3f


whereas my Kubuntu box gives:
1  
2  
3  
$ java EncDemo
UTF-8
c3 be


but if I change the locale:
1  
2  
3  
$ LANG=en_US java EncDemo
US-ASCII
3f
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #31 - Posted 2009-12-23 11:53:05 »

mmm Thanks.. But i do hate it when i learn something that says some deployed code is wrong (but its working!). Guess I got lucky and the encoding is set somewhere in the app.

So that leaves us with strings and some unpacking logic i guess.  I did some tests last night, strings to seem to pack rather tightly into the archives. Random data does not expand the pack200.gz by much more than the data (24 extra bytes from 1024). However there are some illegal values so that not really quite 1024 random bytes.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline pjt33
« Reply #32 - Posted 2009-12-23 12:45:10 »

mmm Thanks.. But i do hate it when i learn something that says some deployed code is wrong (but its working!). Guess I got lucky and the encoding is set somewhere in the app.
Maybe you're just using values in ASCII, which are the same in most encodings which aren't designed for locales which use a non-Latin alphabet. I've been badly bitten before, to the extent that I've added a sanity check string to my main data file which will break if it gets mis-transformed (e.g. saved as UTF-8 and loaded as ISO-8859-1) at any step.
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #33 - Posted 2010-01-03 12:11:52 »

Well after reading up on Pack200 and constant pools and everything else that isn't real work (Molten salt reactors for the win!) I have found a pretty easy way to get the data into the jar/pack200 file.

Just use a resource file!

Really, i get a 2 byte overhead including a file named "run" (ie its already in the global constants pool) of zero length. I also get pretty good compression with non zero length files. Now the best gains are with the fact that the decoding logic is minimal as compared to strings.

Turns out this would be the same overhead in a pack200 file as a attribute that is just stored. ie its very close to just appending raw bytes to the class. I can't see any other method getting close really.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline moogie

JGO Knight


Medals: 12
Projects: 6
Exp: 10 years


Java games rock!


« Reply #34 - Posted 2010-01-03 12:49:43 »

Just use a resource file!

yup, that was my conclusion when i investigated converting the "embed binary data as an class attribute" trick for use in pack200... dont bother for the reasons you stated above.
Offline pjt33
« Reply #35 - Posted 2010-01-03 15:27:34 »

Well after reading up on Pack200 and constant pools and everything else that isn't real work (Molten salt reactors for the win!) I have found a pretty easy way to get the data into the jar/pack200 file.

Just use a resource file!

Really, i get a 2 byte overhead including a file named "run" (ie its already in the global constants pool) of zero length. I also get pretty good compression with non zero length files. Now the best gains are with the fact that the decoding logic is minimal as compared to strings.
Are you going to show us what it is? Sounds to me like a getClass() call, a Class.getResourceAsStream(String) call, and an InputStream.read() call at minimum, which doesn't seem to compare particularly favourably with String.charAt(int).
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #36 - Posted 2010-01-03 15:45:33 »

Its an input stream and ClassLoader.getSys... I don't remember what the test came up with. But since to get any char from a string you must encode it,  most bytes expands to 2 bytes with the high bit/s set (some to 3 bytes). This seems to be bad news for gzip and pack200 does not deal with these well (optimised for strings that are class names for clear reasons).

Over all its a pretty big difference in my case. 10 sprites(line art) is expanding the archive by just 30 bytes now rather than 100, and the addition of ClassLoader and inputstream are worth it (about 30 bytes IIRC).    The loops to put data into datastructures is still the larger part of it.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 11


Game Engineer


« Reply #37 - Posted 2010-01-04 16:27:48 »

A related but also separate question:

Currently I have my level data stored something like this:
1  
2  
3  
4  
5  
6  
wwwwwwwwww
w s  w  ew
w    w   w
w    w   w
w        w
wwwwwwwwww


That goes into an external txt file, sans extension and with one character in the file name. Now, the question is: would be perhaps be a better idea to put the string directly into the Java source like this:

1  
String levelData = "wwwwwwwwww\nw s  w  ew\nw    w   w\nw    w   w\nw        w\n wwwwwwwwww";


And is there an even better way to do it?

See my work:
OTC Software
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #38 - Posted 2010-01-04 16:33:18 »

Well in this case all the characters you use do in fact take only 1 byte in class file UTF-8. Also you could use the statistics of the English language to pick characters, since more common ones use less bits with Huffman coding. (ie use 'e' rather than the rare 'w').  So I would think a string could work pretty well in this case, and you save having the ClassLoader.get... method and InputStream classes in the constant pool as well.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 11


Game Engineer


« Reply #39 - Posted 2010-01-04 18:48:21 »

Well in this case all the characters you use do in fact take only 1 byte in class file UTF-8. Also you could use the statistics of the English language to pick characters, since more common ones use less bits with Huffman coding. (ie use 'e' rather than the rare 'w').  So I would think a string could work pretty well in this case, and you save having the ClassLoader.get... method and InputStream classes in the constant pool as well.
Cool, I'll have to look up Huffman coding and then just choose letters that work better. I only need like 5 characters, after all.

See my work:
OTC Software
Games published by our own members! Check 'em out!
Legends of Yore - The Casual Retro Roguelike
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #40 - Posted 2010-01-04 19:06:19 »

just use the most common letters first, since  a class file seems to be somewhat dominated by Strings in the constant pool, this should give you about as good as it will get. You don't need to understand Huffman encoding, other than more frequent characters will take less bits.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 11


Game Engineer


« Reply #41 - Posted 2010-01-04 22:27:12 »

just use the most common letters first, since  a class file seems to be somewhat dominated by Strings in the constant pool, this should give you about as good as it will get. You don't need to understand Huffman encoding, other than more frequent characters will take less bits.
All right, thanks. That oughta do 'er. I'll just use vowels and common consonants.

See my work:
OTC Software
Offline pjt33
« Reply #42 - Posted 2010-01-04 23:11:22 »

Its an input stream and ClassLoader.getSys... I don't remember what the test came up with. But since to get any char from a string you must encode it,  most bytes expands to 2 bytes with the high bit/s set (some to 3 bytes). This seems to be bad news for gzip and pack200 does not deal with these well (optimised for strings that are class names for clear reasons).
I don't think you're passing the right options to pack200. Tell it to take its time and it should try out various different string encoding mechanisms and pick the one which works best for the statistics of your long string.

@Demonpants, ditch the \n and either hard-code the level width or encode it as the first character. You might find that using characters which are close to each other allows pack200 to do well with delta-encoding. And I would endorse using \u0000 for a common character because that's a very common byte in pack200 files, and should give you good entropy even if it isn't delta-encoded.
Offline Abuse

JGO Coder


Medals: 11


falling into the abyss of reality


« Reply #43 - Posted 2010-01-04 23:35:37 »

And I would endorse using \u0000 for a common character because that's a very common byte in pack200 files, and should give you good entropy even if it isn't delta-encoded.

But in modified UTF8 \u0000 is encoded as 0xC080, not 0x00?

Unless 0xC080 is the byte sequence that you are saying is very common in pack200 files?

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline Eli Delventhal

JGO Kernel


Medals: 42
Projects: 11


Game Engineer


« Reply #44 - Posted 2010-01-05 00:07:46 »

All right, thanks guys. I'll pre-program the width and height and take out the \n.

Can someone perhaps list the 10 most common characters to use in Pack200, or should I just use \u0000 and the most common letters of the alphabet?

See my work:
OTC Software
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #45 - Posted 2010-01-05 10:14:23 »

@pjt33
I have tried many options with both pack200 and 7zip. I even get slightly better performance that Rivens tool with these 2 tools. (but kzip/bjflate beat it by a bit). Also in strings u0000 is not a common item as least in the class files (after a pack200) I have checked. And all strings are encoded with modified utf-8, which is not altered much by pack200 except to gloabalize the constants pool and to make some effort with common prefixes.

At any rate, i am using less bytes now and can forget about encoding/decoding tricks and don't have to unpack bytes from chars.

I have no special talents. I am only passionately curious.--Albert Einstein
Offline pjt33
« Reply #46 - Posted 2010-01-05 10:42:44 »

But in modified UTF8 \u0000 is encoded as 0xC080, not 0x00?
That's true. I'd forgotten that. I'll have to look at the pack200 spec again to see whether it mentions handling of NUL characters.
Offline Abuse

JGO Coder


Medals: 11


falling into the abyss of reality


« Reply #47 - Posted 2010-01-05 13:41:02 »

Pack200 reduces the size of a JAR file by:

3. Storing internal data structures.

Any idea what this is talking about?
Does Pack200 do some magic on arrays defined in class files?

It would make sense that it did - as this is one of the most inefficient structures in a Java class file.

If that's the case, then simply leaving your data as arrays in the class file may turn out to be the most efficient solution!  Lips Sealed

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here! https://dl.dropbox.com/u/54785909/EliteIVsmaller.png
Offline delt0r

JGO Knight


Medals: 26
Exp: 18 years


Computers can do that?


« Reply #48 - Posted 2010-01-05 13:50:43 »

Pack200 stores the internal data in a way that makes life easy for GZip (deflate) and reduces redundancy where it can (works really well with lots of classes). Unfortunately quite a few things still end up producing byte code. It can store byte code well so that deflate will compress it well, but you still end up with a lot of constants in the constant pool.

I did try this with arrays and it does not work well at all.

Remember that the notation
1  
int[] data={1,2,3,4}; 

is really syntax sugar, not something that exists byte code... ie its translated as stated in above posts. 

I have no special talents. I am only passionately curious.--Albert Einstein
Offline pjt33
« Reply #49 - Posted 2010-01-05 16:51:28 »

But in modified UTF8 \u0000 is encoded as 0xC080, not 0x00?

Unless 0xC080 is the byte sequence that you are saying is very common in pack200 files?
Ok. An excerpt from the pack200 spec:

Quote
Each value in the band cp_Utf8_chars is a 16-bit number expressing a Java character. This band contains the characters of all small suffixes, in order. For each successive string, cp_Utf8_chars contains an additional run of values encoding the characters of its small suffix, if any. Therefore, the total length of this band is the sum of all values in the cp_Utf8_suffix band.

Whenever a small suffix length for a constant pool entry is zero, the string has no small suffix, but a big suffix instead. The length of each big suffix is given by an element of the cp_Utf8_big_suffix band. (Therefore, the length of this band is precisely the count of zero values in the cp_Utf8_suffix band.) Each big suffix is transmitted as a separate band of 16-bit character values, one band element per character. There is one such band per big suffix. These bands immediately follow the cp_Utf8_big_suffix band, and are collectively called the cp_Utf8_big_chars bands. Although normally data of the same type are collected into a single band, these strings are placed in separate bands so that they may be independently encoded. These strings typically encode arrays of binary data, rather than true Java characters.
So it doesn't use the modified UTF-8 at all. Instead it uses the true char values and an appropriate encoding. With suitable options (the effort flag) it should try a lot of different encodings (there are about 100 supported) to find the best one. I don't know whether this requires -E100.
Pages: 1 [2]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

 

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

pw (5 views)
2014-07-24 01:59:36

Riven (7 views)
2014-07-23 21:16:32

Riven (8 views)
2014-07-23 21:07:15

Riven (9 views)
2014-07-23 20:56:16

ctomni231 (41 views)
2014-07-18 06:55:21

Zero Volt (37 views)
2014-07-17 23:47:54

danieldean (31 views)
2014-07-17 23:41:23

MustardPeter (33 views)
2014-07-16 23:30:00

Cero (48 views)
2014-07-16 00:42:17

Riven (49 views)
2014-07-14 18:02:53
HotSpot Options
by dleskov
2014-07-08 03:59:08

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:58:24

Java and Game Development Tutorials
by SwordsMiner
2014-06-14 00:47:22

How do I start Java Game Development?
by ra4king
2014-05-17 11:13:37

HotSpot Options
by Roquen
2014-05-15 09:59:54

HotSpot Options
by Roquen
2014-05-06 15:03:10

Escape Analysis
by Roquen
2014-04-29 22:16:43

Experimental Toys
by Roquen
2014-04-28 13:24:22
java-gaming.org is not responsible for the content posted by its members, including references to external websites, and other references that may or may not have a relation with our primarily gaming and game production oriented community. inquiries and complaints can be sent via email to the info‑account of the company managing the website of java‑gaming.org
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!