Hmm. That all looks correct.

- Are you trying to do some kind of conversion from Y-up to Z-up or something like that? It's not easy to do that with quaternions.

- You've shown the code for generating the bind pose matrix. Where's the code for computing an animated matrix?

The intuitive idea behind skeleton animation is to create a local coordinate system for each bone, which is constructed from a translation and a rotation (and a scale, but ignore that for the intuition). Basically, we want a way to calculate the position of a vertex relative to a bone. This is done the exact same way we compute a view matrix: Create a matrix with the transform of the camera, then invert it.

Example: We have a camera at position (1, 1, 1) and construct a matrix:

matrix.translation(cameraPosition); |

is a matrix which simply adds the camera's position to each vertex. If we have a vertex at (0, 0, 0) relative to the camera and want to know where it is in the world, we just add the camera's position to it. Since the vertex is at the same point as the camera in this case (it is at (0, 0, 0) relative to the camera after all), the vertex is at (1, 1, 1) too. Easy to understand. However, we already have vertices in world space, and want to know where they are relative to the camera so we can draw them on a screen. So, we just invert the matrix we made, which in this case is the same as

matrix.translation(-cameraPosition); |

since it's a simple transformation. It's clear that if we have a world space vertex at (1, 1, 1) and apply a (-1, -1, -1) translation to it, we end up at (0, 0, 0) relative to the camera again, as we should.

We do the same thing when computing the "inverse bind pose matrix" or whatever you want to call it. We first calculate the default transform matrices of each bone of the bind pose, then invert it to create a matrix that takes us from model space to a coordinate system relative to the bone. Simply put, it allows us to calculate where a given vertex is compared to a bone. Now, why is this useful? By calculating where a vertex is relative to a bone, we can move the bone and calculate a new position of every vertex affected by it easily by simply calculating the relative position of a vertex and then taking it back to model space again using a different (animated) bone matrix. The result is what we call skeleton animation.

What you want to do is simply this:

1 2
| vec4 localPosition = inverseBindPoseMatrix * modelSpacePosition; vec4 newModelSpacePosition = animatedBoneMatrix * localPosition; |

which can be rearranged like this:

1 2
| vec4 newModelSpacePosition = animatedBoneMatrix * (inverseBindPoseMatrix * modelSpacePosition); vec4 newModelSpacePosition = (animatedBoneMatrix * inverseBindPoseMatrix) * modelSpacePosition; |

In other words, you can precompute a single bone matrix which takes the vertex from its current model space position directly to its new model space position by precomputing (animatedBoneMatrix*inverseBindPoseMatrix).

Now, bone animation generally uses a weighted average of 4 different bones for each vertex. That just means that we compute the new position that each bone would give us and average together the results.

1 2 3 4 5
| vec4 newModelSpacePosition = ((animatedBoneMatrix1 * inverseBindPoseMatrix) * modelSpacePosition) * weight1 + ((animatedBoneMatrix2 * inverseBindPoseMatrix) * modelSpacePosition) * weight2 + ((animatedBoneMatrix3 * inverseBindPoseMatrix) * modelSpacePosition) * weight3 + ((animatedBoneMatrix4 * inverseBindPoseMatrix) * modelSpacePosition) * weight4; |

which can be rearranged to:

1 2 3 4 5
| vec4 newModelSpacePosition = (animatedBoneMatrix1 * inverseBindPoseMatrix) * weight1 * modelSpacePosition + (animatedBoneMatrix2 * inverseBindPoseMatrix) * weight2 * modelSpacePosition + (animatedBoneMatrix3 * inverseBindPoseMatrix) * weight3 * modelSpacePosition + (animatedBoneMatrix4 * inverseBindPoseMatrix) * weight4 * modelSpacePosition; |

and then to

1 2 3 4 5 6 7
| vec4 newModelSpacePosition = ( ((animatedBoneMatrix1 * inverseBindPoseMatrix) * weight1) + ((animatedBoneMatrix2 * inverseBindPoseMatrix) * weight2) + ((animatedBoneMatrix3 * inverseBindPoseMatrix) * weight3) + ((animatedBoneMatrix4 * inverseBindPoseMatrix) * weight4) ) * modelSpacePosition; |

which is the most efficient way of doing it and what you're doing in your shader already.

This may all look complicated at a glance, but it's really just multiplication and addition, just on coordinates and matrices. If you can grasp how this works, then you should be able to debug your code and fix it. There really isn't a shortcut to getting skeleton animation to just work without understanding it, and if you can grasp it you'll be one of the few people in the world who fully understands how this works.