At first I thought this is easy to answer... but...
I'd actually say you'd do this after multiplying by the model matrix. But then you'd need to send the model matrix and the view-projection matrix to the shader sperated and you'd have more uniforms to send and more matrices to multiply vectors by...
So I suggest multiplying the vertices by the model matrix on the cpu (you don't even have to do this every frame, since you don't need to rotate / scale or translate something all the time. So you can leave that out sometimes, if you cache the results) and then upload that to the gpu.
In the gpu I'd then do the animation and finally multiply by the view-projection matrix.
(But this is actually dependent on the "animation")
Here an example:
First, I want to clarify, what I think you mean with "animation": For example a windy effect by transforming the vertices by some layered cosine or sine, for example in Minecraft. Where the leave blocks are then affected by the wind.
So what If I want to translate or rotate the blocks?
If I would just first transform the vertices before they are translated I'd get the same wind effect for all leave blocks, since the position of the vertex is the seed of the wind effect. -> all leave blocks have the positions 0, 0 in the beginning -> All do the same function lookup.
So what I do: I multiply all the leave blocks' vertices by the model matrix. -> They get translated each to the right position, for example 16, 9, wheras others are at other positions. -> They do the right sine-function lookup in the shader and get a beautiful windy effect.
But you can have other "animation"-effects which shouldn't be world-space but screen-space.
For example a waving screen effect (You'd actually do it with Framebuffers and fragment shaders, but here is the example for vertex shaders):
You'd get the screen coords of all the vertices and then disort them by a sine function, dependent on their vertex position.