I just tested and the problem seems to be as I guessed earlier.
1. You render using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA. This blends the foreground with its alpha.
2. You then read the pre-multiplied pixels with glReadPixels into a texture (or save it into a PNG and re-load it later).
3. You then render the pre-multiplied texture with the same blending function as before -- and because you are blending it twice, the alpha is reduced.
The blend func looks like this:
1
| finalColor.rgba = texColor.rgba * texColor.a + bgColor.rgba * (1 - texColor.a); |
So if you use 50% white (1, 1, 1, 0.5), you end up with a finalColor of (1, 1, 1, 0.25) which is not what you want.
Instead, what you want is GL_ONE, GL_ONE_MINUS_SRC_ALPHA:
1
| finalColor.rgba = texColor.rgba * (1, 1, 1, 1) + bgColor.rgba * (1 - texColor.a); |
Which would lead to a final color of (1, 1, 1, 0.5), and still allow you to blend multiple particles atop each other.
When you render with this blend func to the default frame buffer, you will probably just see a white box. However, when you use glReadPixels, you will get an image that is not premultiplied, which can then be rendered back with standard GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA blending.
Another solution would be to just tell the user to use pre-multiplied alpha (or have your app pre-process textures when loading a particle image), then you can just use GL_ONE, GL_ONE_MINUS_SRC_ALPHA everywhere.