Well first of all you should debug your stuff before jumping into conclusion that it is an inifinite loop or a memory error (?).
Have you debugged to see if the size of the list is decreasing?
Secondly, you're putting a random factor in your benchmark for adding two types of elements ("hello" and "goodbye") which will directly affect the outcome of the remove phase. In the ArrayList run you might have 2 times more removes than the Bag removes, so you cannot conclude anything with that.
If you want to benchmark something against something else, they should operate on the exact same data, not on completely different sets!
With this random set benchmarking you might (and probably will) get different results even if you declared both lists as a Bag!
And finally, if you're measuring the add and remove, your loop should contain ONLY that, not a call to a random function which probably is more expensive than an add().
Let's assume your whole loop spends 1000ms in the Random function, and 200ms in the add() for ArrayList and 100ms for the add in the Bag.
You'll conclude that ArrayList performs 1100/1200=91% of Bag, when you should be concluding that it performs 100/200=50% comparing to Bag.
You're right; I probably should have taken all of those other factors into consideration! However, I wanted to also check randomness as part of their performance, because I would be using them in scenarios involving randomization.
EDIT: Another thing is that both the Bag and the ArrayList are given the same set of instructions, and yet the Bag will always finish all of its removes in less than a tenth of a second. I did some outputs on the ArrayList removes and it takes it an astronomical amount of time for it to do essentially the same thing, consistently; in fact, I haven't had the patience to let it complete a full run yet, but it takes many minutes.
Microbenching is hard. No offense intended, but everything is wrong.

No offense taken; I have much to learn!

Colton