The entire problem is taking in a stream of emissions/observations and predicting the next emission. The underlying states are irrelevant.

You mean it's irrelevant to your post because they've given you a structure? Out of interest, how many states are there and how many transitions with non-zero probability? If it's up to you to make the structure then this is a pretty critical step.

Quote

Baum-Welch seems to suck. If I repeatedly train my HMM with the entire sequence of emissions/observations, it eventually explodes into a HMM filled with 0s. The scarce information I've found says that this is due to overfitting.

I don't think that should be possible. The transition probabilities out of each state have to sum to 1, though that is including the "stay" transition back in to the state itself. Overfitting can generate some states with only a single non-zero probability transition in and out but you'll either need to have tons of states or be training it on the same sequence again and again.

Quote

What is the optimal way of training an HMM given a stream of data?

Baum-Welch is pretty good. I'd stick with that until you get it to work.

Quote

Given the sequence of emissions [a, b, c], to calculate the probability of each possible next emission I compute the probability of the emission [a, b, c] occurring given the current HMM state and then the probabilities of [a, b, c, <X>] occurring, where X is each possible emission. The conditional probability of each emission is then

p([a, b, c, <X>]) / p([a, b, c])

Is this the correct way of calculating the probability of the next emission?

p([a, b, c, <X>]) / p([a, b, c])

Is this the correct way of calculating the probability of the next emission?

After seeing [a,b,c] you could be in any of your states. Call the set of states <N>. You first need to calculate the probability of being in each n in <N>: P([a,b,c,n]). This is the "forward" step that you'll find documented (Wikipedia is fine for this).

In each of the states there will also be a probability of emitting each x in <X>: P(x|n). That's straight from the current model parameters.

For an emission x, P([a,b,c,x]) is the sum of all the ways x could be emitted, i.e. sum of P([a,b,c,n])*P(x|n).

If you follow through with the method above to just predict the next state I think you'll be doing Forward/Backward. If all you want is the single most likely state then it's much simpler to do Viterbi, which is just plain old dynamic programming applied to the HMM (and for me at least, is easier to understand).