You have a server with an authoritative game state from some time in the past Si. At set intervals, for any period of time in which it has inputs from all of its clients, it will simulate the game state to a new authoritative state at time Sf, and then send this to all of its clients.
The clients will then revert their game states to Sf, discard all cached inputs before Sf, and then use the remaining ones to interpolate the game state up to time N, the actual current game time. To simulate the other client players, they will simply guess based on the last input given to them.
Whenever the clients update, they will continue to cache inputs for that period of time, and also send their inputs to the server.
So does that sound about right? Is there anything I can do to improve how a client guesses what the other clients are doing? Or just improve it in general?
I don't think there's an specific answer to your question, until you drill down to what kind of game you want to make. Some strategies might work better for some kind of games, and some for the other.
To be honest I don't believe what you described could be very good, except for turn based games, unless the tick rate of your server is extremely high (which would easily saturate your network).
In a real-time game I would recommend the server to run the game as it were in the client: full logic at N ticks per second, process the player inputs as they come, broadcast it to everybody.
Now here it comes when I said the type of game:
If it is a FPS, let's say Quake, players are likely to keep moving, rotating, dodging,shooting, at a very high rate; so you might want to have a packet being sent very often updating the players position.
However, if you have an action-rpg, think of Diablo, you don't need such high rate of packages. (now I'll tell you from experience. This is exactly how I implemented my online game
Reign of Rebels).
You basically just need a package when an entity moves, or attacks (or performs any significant action).
Let's say you have entity A standing on point 10,23 and the player (or the AI) directs it to move to point (14, 25). The client will send the message that it wants to move, server receives it and broadcasts this. Just 1 package exchange and you'll have the entity performing the exact same move on the client and on server. When the entity hits is target, it will stay there until further orders are given. No need to send a shitload of packages telling that the entity A is standing in position 14,25.
This strategy might even work in some more fast paced games where your Physics/pathfinding is deterministic.
Of course you would use some interpolation tricks to make things smooth (some of which I discussed
here ), but this is definitely a solution for this kind of game.
Maybe if you show us a video of a game similar to yours, we can give better suggestions on how to implement it.
As for TCP/UDP: Just go for TCP. You have already too much to worry in the making of a game, just let TCP handle some of the headache for you.