Thanks for the in-depth reply! Here's some more information:
Based on the testing and timings that we've done, I figure it *has* to be something other than writing to the depth buffer itself that is causing the problem. Just to test it out (and I might end up keeping it this way), I had a free channel in my shadow frame buffer that I wasn't using, so I manually wrote the depth of all the objects to the free channel in the shadow buffer. Then, in the main rendering shader, I checked the depth against the channel in the shadow buffer and drew/didn't draw the pixel accordingly. On my laptop with integrated GPU, it ran at around 250 FPS this way, while with the standard depth buffer it ran at 30-40 FPS. On my PC with dedicated graphics it has no real impact compared to standard depth testing.
There is no way the code I scraped together is 5 times more efficient than the built in depth testing so I figure there has to be something else we were doing wrong