Pretty much finished converting WSW to using JOML instead of LibGDX. Excellent library. Faster implementations, thread-safety out-of-the-box, smaller. Still awaiting a few features to be able to finish it and see how broken it is. =P
Did some profiling now that skeleton animation is over twice as fast thanks to JOML and found a physics bottleneck. The physics simulation uses a 2D grid to quickly find nearby bodies for collision detection. Each "tile" in the grid is just a list of bodies that are in that grid. To query nearby bodies, you pass in a position and a radius, and the grid returns all bodies that intersect the square the position+radius forms (circle test is too expensive). Since the grid tile size is bigger than the bodies, each body generally only needs to check 1 tile, or 2 or 4 if the query area extends across a grid boundary. While doing some optimizations getting rid of some unnecessary divides, I realized that there was a bug in there that expanded the search area by 1 tile in each dimension, meaning a 1x1 check turned into a 2x2, and a 1x2 check turned into a 2x3. This had a huge impact on performance as it basically does 2-4x as much work. In combination with getting rid of a few divides, Math.floors() and Math.ceils(), I almost managed to double the body-body collision testing performance.
What kind of perfomance scaling you have with increasing tile size. Many times I have found that making tiles much bigger than bodies is net win. Basically it increases number of hit test but fits cache much better.