I'd highly recommend separating your spatial tree (be it bsp, quad or oct tree) from your actual scene graph/description. If you attempt to mung both of them together you'll end up making compromises and end up with a structure thats inefficiant for both

Bearing in mind what kev said that a scenegraph can be pretty much anything. A simple linked list of game objects could be good enough, or you might need a proper DAG tree with transform nodes, geometry nodes etc. Equally, you might need a fancy scenegraph but if your levels are small or the whole thing is displayed constantly you might not need a proper spatial tree, and you could just have a list of visible objects with some quick and inaccurate frustum culling.
Xith might be a good idea to start with, you get to toy with a higher level API, but can go down to low level GL if need be. Other than that any are suitable (although i'd avoid Java3D..).