Sarsa is similar to Q-learning, but uses the value of the actually performed action to determine its update, instead of the maximum available action. Its equation is:
Neutral characteristics
It is on-policy (this means its Q values approximate the value including the effects of exploration).
It learns state-action values (Q values).
Advantages
Tabular Sarsa can be shown to reach optimal solutions, when exploration decreases in a proper manner.
Also for Sarsa, there has been much research and succesful applications with Q-learning.
Has a more natural extension to eligibility traces than Q-learning.
Note that because knowledge of the next action is required, the algorithm is a little more complex than that of Q-learning or Expected-Sarsa.
Selected relevant publications:
G. Rummery and M. Niranjan (1994), On-line Q-learning using Connectionist systems, technical report no.166, University of Cambridge, Engineering Department.
S. P. Singh, T. Jaakkola, M. L. Littman and C. Szepesvari (2000), "Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms", Machine Learning, volume 38, number 3, pages 287-308, 2000.