As always when in doubt: Implement it in the simplest possible way and profile your way out of bottlenecks later. Many of those so called optimizations mean nothing compared to a few well placed ones found during profiling.
This is certainly true for micro-optimizations. At global level, if you program is performance-senstive, IMHO you should think about performance from start - but, to stress it enough, at big level, not for specific micro solutions.
To give example about importance of profiling - real example from my life, happened to me few months ago. One of the method calls on user request was taking 2.5s to complete. This was one of method called from servlet. 2.5s doesn't seem so much for web-based interface, but it was 2.5s of heavy processor usage, so it would kill performance if more than few clients would call it at same time.
A LOT of reflection was used inside this method. This was my first 'sure' candidate for optimalization, as same Methods was created again and again for same calls. I implemented classloader/class aware cache and measured speed again... 1900ms. On one hand, 600ms is not a bad gain, but what is taking rest of time ?? I have run simple profiler and found the offender... inside most inner loop, log was called (leftover from testing phase). It was constructing string with info and logging it - writing multiple megabytes to synchronized log line after line... After removing this line, method now takes 20-30ms, which is on the same order of magnitude as measurement error.
Of course, if I would not have implemented cache, it would take 500-600ms, which is still too long. But without profiling, reflection itself was main suspect for being slow and we have already considered rewriting everything from scratch without reflection (a MAJOR undertaking).
Back to your problem. There are few problems with multi-dimension arrays in java. Jits cannot optimize bound checks very well, you pay array space overhead each time and you are not cache friendly. Extra dereference comes somewhere at the end of scale.
How important are these structures ? For example, if you want to keep your vertex list in array, float[vertices] is out of question. If you just want to represent some data which is loaded from/sent to net, then structure is probably not important, as net access will be only thing affecting speed of program.
My solution ? Abstract access to these arrays. Create a wrapper class representing your data (thing like VertexList or Matrix) and then you can not care about actual implementation since very last stage of project. Current jvms inline all calls almost perfectly, so you will have same performance as without wrapper.