I rather often abuse UTF8 to encode binary to pass it into a textbased API.
Today, after years (!!), was the first time I got caught by non-reversible UTF8 encodings.
byte original = ....;
String encoded = new String(original, "UTF-8");
byte decoded = encoded.getBytes("UTF-8");
Gotta rewrite some stuff... shame on me
Presumably the cause of your problem is that 'byte original' contains a string encoded using modified UTF-8
, rather than UTF-8
? (caused by inproper use of dos.writeUTF elsewhere in your app.)
Though if that's the case i'm surprised you hadn't encountered a problem sooner; it's unusual for binary data to contain no zeros!
Though perhaps the UTF-8 decoder used by the String constructor is silently accepting an Overlong encoding for zero, and you've only been caught out now because you're data contains one of the UTF-16 surrogate pair byte values. (which are also encoded overlong in modified UTF-8
If that's the case the UTF-8 decoder used by Java is being very naughty - as accepting overlong encodings would mean it fails to meet the current Unicode compliancy requirements!