From a data structure point-of-view, I suggest that what you have is three basic structures to build your world.
At the most basic level you may need a heightmap, as there is always ground somewhere.
I would then generate an octree of the actual geometry that the heightmap produces upon loading the sector.
The octree solves the problem of having things on different levels.
The next data structure you need is probably directly going to be the geometry of overhangs and caves or other more three dimensional structures. Most of your map won't have any such structures in, I'm guessing, which is why you can stash the floor in the memory-and-bandwidth efficient heightmap format. However the overhangs may as well be stored directly as polygons, and shoved straight into the octree.
You can also add smaller lumps of repeating scenery at this point - trees, walls, rocks and things like that - and place them directly in the octree.
The complexity then comes from collision detection, made slightly easier by the octree, but made slightly more complex than a traditional BSP because you have to do a bit more checking against individual polygons.