Java-Gaming.org
Play Revenge of the Titans! The situation is critical. We need fancy commanders to defend Earth, the moon, Mars!
Featured games (78)
games approved by the League of Dukes
Games in Showcase (404)
games submitted by our members
Games in WIP (289)
games currently in development
News: Read the Java Gaming Resources, or peek at the official Java tutorials
 
    Home     Help   Search   Login   Register   
Pages: [1]
  ignore  |  Print  
  ... so never use UTF-8 encoding for binary stuff  (Read 1268 times)
0 Members and 1 Guest are viewing this topic.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 437
Projects: 4


Hand over your head.


« Posted 2009-04-20 18:56:38 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.

Today, after years (!!), was the first time I got caught by non-reversible UTF8 encodings.

1  
2  
3  
4  
5  
         byte[] original = ....;
         String encoded = new String(original, "UTF-8");
         byte[] decoded = encoded.getBytes("UTF-8");

         Arrays.equals(original, decoded); // false!


Gotta rewrite some stuff... shame on me !

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Projects: Revenge of the Titans, Titan Attacks, Droid Assault, and Ultratron
Offline ryanm
« League of Dukes »

Senior Member


Projects: 1


Used to be bleb


« Reply #1 - Posted 2009-04-20 20:03:12 »

Don't know why this stuff isn't already in the JRE, but Base64 encoding works for me when I'm ramming binary data into java.util.prefs.
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 437
Projects: 4


Hand over your head.


« Reply #2 - Posted 2009-04-20 21:44:35 »

It is there, in rt.jar, but not supported:

sun.misc.BASE64Encoder
sun.misc.BASE64Decoder

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Projects: Revenge of the Titans, Titan Attacks, Droid Assault, and Ultratron
Games published by our own members! Check 'em out!
Try the Free Demo of Revenge of the Titans
Offline Abuse

JGO Coder


Medals: 2


falling into the abyss of reality


« Reply #3 - Posted 2009-04-20 23:24:01 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.

Today, after years (!!), was the first time I got caught by non-reversible UTF8 encodings.

1  
2  
3  
4  
5  
         byte[] original = ....;
         String encoded = new String(original, "UTF-8");
         byte[] decoded = encoded.getBytes("UTF-8");

         Arrays.equals(original, decoded); // false!


Gotta rewrite some stuff... shame on me !

Presumably the cause of your problem is that 'byte[] original' contains a string encoded using modified UTF-8, rather than UTF-8? (caused by inproper use of dos.writeUTF elsewhere in your app.)

Though if that's the case i'm surprised you hadn't encountered a problem sooner; it's unusual for binary data to contain no zeros!
Though perhaps the UTF-8 decoder used by the String constructor is silently accepting an Overlong encoding for zero, and you've only been caught out now because you're data contains one of the UTF-16 surrogate pair byte values. (which are also encoded overlong in modified UTF-8)

If that's the case the UTF-8 decoder used by Java is being very naughty - as accepting overlong encodings would mean it fails to meet the current Unicode compliancy requirements!

Make Elite IV:Dangerous happen! Pledge your backing at KICKSTARTER here!
Offline Riven
« League of Dukes »

JGO Overlord


Medals: 437
Projects: 4


Hand over your head.


« Reply #4 - Posted 2009-04-21 00:00:36 »

I always was 'serializing' more or less textual data, but binary in the end - like what you get from DataOutputStream when your protocol is mainly string-based.

Today it simply went bezerk, due to the need to write binary in a text SQL column: ObjectOutputStream -> utf8 -> ObjectInputStream.

Hi, appreciate more people! Σ ♥ = ¾
Learn how to award medals... and work your way up the social rankings
Projects: Revenge of the Titans, Titan Attacks, Droid Assault, and Ultratron
Offline pjt33
« Reply #5 - Posted 2009-04-21 10:02:32 »

I rather often abuse UTF8 to encode binary to pass it into a textbased API.
Why not use ISO-8859-1? That has 256 characters, so it's a lot more suitable.
Pages: [1]
  ignore  |  Print  
 
 
You cannot reply to this message, because it is very, very old.

Play Revenge of the Titans! The situation is critical. We need fancy commanders to defend Earth, the moon, Mars!
 
Get high quality music tracks for your game!

Add your game by posting it in the WIP section,
or publish it in Showcase.

The first screenshot will be displayed as a thumbnail.

The invasion has landed! On Mars! And you're there to beat 'em!
cubemaster21 (39 views)
2013-05-17 21:29:12

alaslipknot (47 views)
2013-05-16 21:24:48

gouessej (76 views)
2013-05-16 00:53:38

gouessej (76 views)
2013-05-16 00:17:58

theagentd (84 views)
2013-05-15 15:01:13

theagentd (78 views)
2013-05-15 15:00:54

StreetDoggy (120 views)
2013-05-14 15:56:26

kutucuk (144 views)
2013-05-12 17:10:36

kutucuk (144 views)
2013-05-12 15:36:09

UnluckyDevil (154 views)
2013-05-12 05:09:57
Complex number cookbook
by Roquen
2013-04-24 12:47:31

2D Dynamic Lighting
by Oskuro
2013-04-17 16:46:12

2D Dynamic Lighting
by Oskuro
2013-04-17 16:45:57

2D Dynamic Lighting
by Oskuro
2013-04-17 16:23:20

Noise (bandpassed white)
by Roquen
2013-04-05 17:36:01

Noise (bandpassed white)
by Roquen
2013-04-03 16:17:38

Java Data structures
by Roquen
2013-03-29 13:21:12

Topic Request
by kutucuk
2013-03-22 21:42:01
Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines | Managed by Enhanced Four Valid XHTML 1.0! Valid CSS!
Page created in 0.131 seconds with 20 queries.