Here is an interesting article that discusses some of the challenges of what you want to do, if you are doing it in pure Java.
http://quod.lib.umich.edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.2007.131"Real-Time, Low Latency Audio Processing in Java"
NoiseFever's "nordOsc" synth (a nearby thread) has very solid rhythm and gives a few hints about how he achieves it near the end of the thread. For myself, it was only when running a mixer I wrote in the background, continually, and feeding it sounds that I've been able to get decent timing. It is a fair bit of work to set something like this up, so if the goal is to just get it done, you might consider latching onto a library that can already do this.
By the way, are you loading the Clip once, then playing/replaying it (best practice) or loading it anew with each playback (common newbie error)? The former will perform better, but probably won't eliminate all the timing problems. Even loading and playing from file (using SourceDataLine) starts the sound more reliably than a loading a playing a Clip, but resetting and restarting an existing Clip should be the best.
There is also the problem that Microsoft OS's having a rather infrequent clock interrupt (once every 15 or 16 msec) which impacts the accuracy of sleep amounts and timers and game loops. Linux & Mac systems seem to have closer to 1msec accuracy. But even so, there are issues with code going in and out of RAM from bytecode, garbage collection and other JVM timing issues, as the article discusses.
Good luck! There will be others who will have libraries to recommend, I'm pretty sure.