Java-Gaming.org Java4K winners: [ by our judges | by the community ]         
Featured games (67)
games approved by the League of Dukes
Games in Showcase (∞)
games submitted by our members



News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  Print  
  ... so never use UTF-8 encoding for binary stuff  (Read 1055 times)
0 Members and 1 Guest are viewing this topic.
Online Riven
« League of Dukes »

JGO Kernel
*****

Posts: 5870
Medals: 255


Hand over your head.


« on: 2009-04-20 12:56:38 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.

Today, after years (!!), was the first time I got caught by non-reversible UTF8 encodings.

1  
2  
3  
4  
5  
         byte[] original = ....;
         String encoded = new String(original, "UTF-8");
         byte[] decoded = encoded.getBytes("UTF-8");

         Arrays.equals(original, decoded); // false!


Gotta rewrite some stuff... shame on me !

Hi, appreciate more people! Σ ♥ = ¾

Learn how to award medals... and work your way up the social rankings
Offline ryanm
« League of Dukes »

JGO Strike Force
*****

Posts: 788
Medals: 4


Used to be bleb


« Reply #1 on: 2009-04-20 14:03:12 »

Don't know why this stuff isn't already in the JRE, but Base64 encoding works for me when I'm ramming binary data into java.util.prefs.
Online Riven
« League of Dukes »

JGO Kernel
*****

Posts: 5870
Medals: 255


Hand over your head.


« Reply #2 on: 2009-04-20 15:44:35 »

It is there, in rt.jar, but not supported:

sun.misc.BASE64Encoder
sun.misc.BASE64Decoder

Hi, appreciate more people! Σ ♥ = ¾

Learn how to award medals... and work your way up the social rankings
Games published by our own members! Go get 'em!
Offline Abuse

JGO Kernel
*****

Posts: 1866
Medals: 5


falling into the abyss of reality


« Reply #3 on: 2009-04-20 17:24:01 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.

Today, after years (!!), was the first time I got caught by non-reversible UTF8 encodings.

1  
2  
3  
4  
5  
         byte[] original = ....;
         String encoded = new String(original, "UTF-8");
         byte[] decoded = encoded.getBytes("UTF-8");

         Arrays.equals(original, decoded); // false!


Gotta rewrite some stuff... shame on me !

Presumably the cause of your problem is that 'byte[] original' contains a string encoded using modified UTF-8, rather than UTF-8? (caused by inproper use of dos.writeUTF elsewhere in your app.)

Though if that's the case i'm surprised you hadn't encountered a problem sooner; it's unusual for binary data to contain no zeros!
Though perhaps the UTF-8 decoder used by the String constructor is silently accepting an Overlong encoding for zero, and you've only been caught out now because you're data contains one of the UTF-16 surrogate pair byte values. (which are also encoded overlong in modified UTF-8)

If that's the case the UTF-8 decoder used by Java is being very naughty - as accepting overlong encodings would mean it fails to meet the current Unicode compliancy requirements!
Online Riven
« League of Dukes »

JGO Kernel
*****

Posts: 5870
Medals: 255


Hand over your head.


« Reply #4 on: 2009-04-20 18:00:36 »

I always was 'serializing' more or less textual data, but binary in the end - like what you get from DataOutputStream when your protocol is mainly string-based.

Today it simply went bezerk, due to the need to write binary in a text SQL column: ObjectOutputStream -> utf8 -> ObjectInputStream.

Hi, appreciate more people! Σ ♥ = ¾

Learn how to award medals... and work your way up the social rankings
Offline pjt33

JGO Strike Force
***

Posts: 914
Medals: 17



« Reply #5 on: 2009-04-21 04:02:32 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.
Why not use ISO-8859-1? That has 256 characters, so it's a lot more suitable.
Pages: [1]
  Print  
 
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.16 | SMF © 2011, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.51 seconds with 20 queries.