I once thought of something like using Java Sound Clip and read my wavs as a big byte []. Could I then pitch the sound by removing every 2:nd / 3:rd byte just as if I where scaling an image?
Simply removing samples (in audio or images) might get close to what you want but it is "wrong". You will get aliasing effects, unless you properly filter. Do some research on digital signal processing and I'm sure you will find how to do proper filters. This is not something that Java is suited for. The best way to do it is using the vector instructions of whatever processor you are running on (MMX/SSE/SSE2 on x86, Altivec on PPC, etc.), anything else will run MUCH slower than it should.