Lightly started experimenting with converting WSW and Insomnia to JOML. I will be editing this post as I find features I'm missing. KaiHH, you may want to add me on Skype so we can communicate through chat there (same username as here).
 Matrix3f/d.set(Matrix4f/d): copies rotation of 4D matrix.
 Matrix4f/d.getTranslation(Vector3f/d): opposite of setTranslation(). Stores result in provided vector.
 Matrix4f/d.getScale(Vector3f/d): gets the scale of the X, Y and Z axes. Stores result in provided vector.
 Vector3f/d.distanceSquared(Vector3f/d): calculates the squared distance between two points.
 Vector3f/d.lerp(Vector3f/d, float/double alpha): does linear interpolation between two vectors.
 Vector3f/d.project(Matrix4f/d): same as mul(Matrix4f/d), but also calculates resulting w value and divides by it at the end. Proposed implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13
 public Vector3d prj(Matrix4d mat, Vector3d dest) { double w = mat.m03 * x + mat.m13 * y + mat.m23 * z + mat.m33; if (this != dest) { dest.x = (mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30) / w; dest.y = (mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31) / w; dest.z = (mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32) / w; } else { dest.set((mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30) / w, (mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31) / w, (mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32) / w); } return this; } 
<<<FATAL BUG>>>: Vector3d.mul(Matrix4d mat, Vector3d test) DOES NOT ADD THE TRANSLATION! It does NOT assume that w=1.0 despite the JavaDoc saying so! Fixed version:
1 2 3 4 5 6 7 8 9 10 11 12
 public Vector3d mul(Matrix4d mat, Vector3d dest) { if (this != dest) { dest.x = mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30; dest.y = mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31; dest.z = mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32; } else { dest.set(mat.m00 * x + mat.m10 * y + mat.m20 * z + mat.m30, mat.m01 * x + mat.m11 * y + mat.m21 * z + mat.m31, mat.m02 * x + mat.m12 * y + mat.m22 * z + mat.m32); } return this; } 
 Vector3f/d.setLength(float/double): normalizes the vector to a given length (lengthArgument/sqrt(x*x+y*y+z*z)).
 Matrix*f/d.add(Matrix*f/d): sums up each element in the matrices.
 Matrix*f/d.sub(Matrix*f/d): you get it.
 Matrix*f/d.fma(Matrix*f/d): multiply and add version (EXTREMELY useful for skeleton animation).
 Matrix*f/d.scale(Vector*f/d): scale and scaling function with vector parameter.
 Matrix*f/d.scaling(Vector*f/d): scale and scaling function with vector parameter.
 Matrix*f/d.get() functions that work on ByteBuffers instead of Float/DoubleBuffers.
 Vector3f.mul(Matrix3/4d): to multiply float vectors by double matrices.
 Vector3f.project(Matrix4d): to project float vectors by double matrices.
 A function to normalize the rotation part of a 3D or 4D matrix. This is useful since after generating a normal matrix, the axes are probably not unit.
<<<FATAL BUG>>>: Matrix4d.rotate() resets the scale of the matrix somehow. I can't figure out how, but it does.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
 System.out.println(new Matrix4d().scaling(0.0000001).rotate(0, 0, 1, 0));
System.out.println(new Matrix4d().rotation(0, 0, 1, 0).scale(0.0000001));

 Vector3f/d.rotate(Matrix4f/d): Multiplies a vector by only the rotational part of a 4D matrix (= assume w=0) (= what mul() does now due to the bug).
<<<FATAL BUG>>>: The Matrix4d.normal(Matrix3d dest) fast path produces incorrect results. Commenting out the fast path causes the function to calculate correct normal matrices.
<<<FATAL BUG>>>: The Quaternionf.slerp() tends to produce NaN every now and then. I replaced the implementation with that of LibGDX and my performance skyrocketed. You DEFINITELY need a better implementation.