Out of interest, why do you consider this mechanism to be badly flawed?
(barring the obvious task of ensuring the canned result itself is loaded in correctly)
In the world of J2ME I very much wish Sun HAD done this kind of unit testing of the reference implementations primitive & image drawing methods, as alot of them mis-behave in embarrassingly simple scenarios.
Well for a game I don't want all my graphics unit tests to break just because I tweeked the shader and traded some accuracy for performance. Practically all pc-style hardware rendering can give slightly different results based on drivers, hardware, colour depth, etc. etc. but will all still look 'correct'. Then you start getting into %age difference between the test result and the reference, and it all gets a little icky.
However now you mention in, for JME and other low-level pixel pushing APIs this method would probably work pretty well. You should be able to get much more reproducable results when you're just talking about basic sprite blitting.