The strange things is performance is acceptable for the first 3 seconds then decrease dramatically.
Not sure if this applies here (as i don't use xith or know this demo) but this reminds me of some problem that i've had with my own collision system (a swept ellipsoid approach): If the movement is decoupled from the frame rate, i.e. entities are moving the same way in n seconds regardless of the actual framerate, the following can happen (time is simplified to "ticks" here to make it easier to follow):
rendering takes 1 tick. Entities should move 10 units per tick, in this case (1 tick), ten units. This may take 1 tick also on a slower cpu.
rendering takes 1 tick, plus 1 tick from the former movement/collision detection results in 2 ticks, so entities have to move 2*10 units to compensate the low frame rate. Because of the increased movement, collision detection takes longer...let's say 2 ticks.
1 tick for rendering, 2 ticks for collision detection=>3 ticks overall=>entities have to move 3*10 units=>collision detection takes now 3 ticks.
....this can go on and on...and finally, you'll end up with seconds/frame instead of frames/sec. because the whole system will never recover from this situation.
This is, of course, a very simplified example but it describes the basic problem. I just thought it could be helpful to mention it, because the description of the problem (fast startup but cripples to incredible slowness for no obvious reason) reminds me of it. If the cpu is fast enough, this will never happen, because the collision detection happens too fast to have greater influence, but if it isn't, well...