You need to get your data to bytes somehow, no matter how you are doing your networking, so there will always be some overhead there.

KryoNet uses Kryo for serialization. When automatically serializing your objects, Kryo is almost as fast as hand written serialization code. It generates bytecodes instead of using reflection when possible. Kryo can optionally use Unsafe on Sun VMs to directly access object memory, which is extremely, ridiculously fast for arrays of primitive types.
Kryo serialization is pluggable, so you can hand write serialization code and Kryo has utilities for making this easy. Hand written serialization code using Kryo is generally faster and smaller than hand written serialization code using java.io.Externalizable.
Here are benchmarks. It's a lot of data, so I'll pluck out the interesting parts:

"kryo" in this chart is using automatic serialization. "java-built-in" is Java's automatic serialization. Here's another:

The "kryo-manual" in this chart is hand written serialization code. The "java-manual" is hand written Externalizable code. I won't spam more charts, but the "size" charts are also cool, showing Kryo is also size efficient.

Of course the benchmark data is small and string heavy, so likely not terribly meaningful unless you run it with your own actual data.
TLDR; Kryo is super awesome, fast serialization.

Back on topic though, KryoNet does have some limitations, mostly with threading. KryoNet is limited to one network thread. Objects are serialized on any thread that calls send() and bytes are usually sent immediately from that thread, though they may be queued for sending later from the network thread. That part is fine, but bytes are always received and queued on the network thread, and deserialization happens on that thread as well. Once you get the deserialized object you can process it on another thread, but doing the deserialization on the network thread limits throughput. This starts to become an issue when exceeding 1Gbit/s.
KryoNet is an easy to use API on top of NIO. The API doesn't really care how the networking is done though. I have a version of KryoNet that uses Netty under the covers, which allows for more threading flexibility. It doesn't have all the minor features of KryoNet though, and I haven't found the time to finish it.
