To quote Mr. Ollivander: "You ask deep questions, Mr. Potter."
It really depends on how sophisticated you want it to be. If you just want it to answer back with 'whatever', then you just write something that when it receives input just outputs something. If you want it to actually parse the input and attempt to respond intelligently...
... First, you scan through the words we hear and attempt to identify ones of importance. This means breaking the input down into its parts (Words) and then checking through a catalog of words to see if they're there. Perhaps with little flags that say whether to 'ignore this word' or 'that word'. Such as 'is' or 'the'. Then it can attempt to come up with something worth saying on the given words and pick one.
Second, you can scan through for phrases, instead of just words. Which is even harder/slower.
Third, you attempt to identify the syntax of the language, and use that to construct the meaning of a sentence, which your bot can then respond to.
Fourth, you attempt to identify the semantics of the language, and use that to actually understand what a sentence means.
I would assume though, that you were more talking about step one.
