Example:
L R
-------------
U | 1,0 | 1,1 |
-------------
D | 2,1 | 1,0 |
-------------
(U,R) is a weak Nash equilibrium - it has single outgoing arrows but no double outgoing arrows. (D,L) is a strict Nash equilibrium - it has no outgoing arrows.
Row can move the game from the weak to the strict equilibrium by changing from (U,R) to (D,R). This is modular rational and only requires the assumption that Column is practically rational.
Example of escaping from an inefficient strict Nash equilibrium:
L R
-------------
U | 0,0 | 1,1 |
-------------
D | 2,1 | 0,0 |
-------------
(U,R) is a strict Nash equilibrium, but Row can obtain a better payoff by changing to (D,R) - which is not modular rational - on the expectation that Column will then change to (D,L).
Define a player who can break an equilibrium in this way as 1-step rational (modular rational players are 0-step rational). To play an n-step rational strategy you must know that your opponent is (n-1)-step rational.
Ratifiability:
Ratifiability is relevant when your choice is believed to influence the probabilities of the outcomes, independently from your performance.
Jeffrey uses the example of a prisoner's dilemma in which the players' choices are correlated: if the first player chooses to cooperate, then with probability p the second player also chooses to cooperate; if the first player chooses to defect, then with probability q the second player also chooses to defect. Jeffrey shows that if the correlation between the players' choices is strong enough, the best choice according to Bayesian decision theory is to cooperate. However, Jeffrey then argues that this choice is not ratifiable: a player on the verge of cooperating, realising that her opponent is likely to have made the same choice, will prefer to defect. The player's belief that defection would be preferable is based on a prediction that her opponent will cooperate, which is in turn based on her own choice to cooperate; thus her reasoning contains a contradiction unless it is somehow possible for her to choose one act and then perform another.
Jeffrey explicitly rules out the possibility of the player changing her mind after choosing. The only way for a player to choose one act and perform another is misimplementation due to circumstances beyond the player's control.
The possibility of misimplementation transforms the prisoner's dilemma, a game with two acts and four outcomes, into a game with two acts and sixteen outcomes, representing all possible combinations of choice and performance for the two players. If, for the sake of simplicity, we assume that all acts have the same probability m of misimplementation, the probability matrix is as follows:
Acts performed
CC CD DC DD
CC p(1-m)^2 pm(1-m) pm(1-m) pm^2
CD (1-p)m(1-m) (1-p)(1-m)^2 (1-p)m^2 (1-p)m(1-m)
Acts chosen
DC (1-q)m(1-m) (1-q)m^2 (1-q)(1-m)^2 (1-q)m(1-m)
DD qm^2 qm(1-m) qm(1-m) q(1-m)^2
Both players can make their choices according to standard Bayesian decision theory, without reference to ratifiability.
"Hence I learn to do a service to another, without bearing him any real kindness; because I foresee, that he will return my service, in expectation of another of the same kind" - David Hume
Trivers defines reciprocal altruism as "the trading of altruistic acts in which benefit is larger than cost, so that over a period of time both enjoy a net gain" [Triv1].
Wahl and Nowak [WN] describe the prisoner's dilemma in terms of the cost of cooperating, c, and the benefit of receiving cooperation, b. The restriction b > c > 0 leads to the familiar payoff structure of the prisoner's dilemma: b > b - c > 0 > -c, or in other words T > R > P > S, and 2(b - c) > b - c, or in other words 2R > T + S. This payoff structure also meets the requirements for the alternating prisoner's dilemma [NS2]: T + S = P + R.
Public goods problems, social dilemmas [Dawe] and reciprocal altruism [Triv] find natural expression in the form b > c > 0, but not all prisoner's dilemmas can be expressed in this way (eg the commonly used payoffs T = 5, R = 3, P = 1, S = 0). Those that can might be called "reciprocal dilemmas".
Roberts and Sherratt [RS] point out that this payoff structure can be described by a single parameter k = b/c, by normalising c to 1.
WSLS recovers quickly from accidental defections when playing itself (CD DD CC), but falls into a punishment cycle when playing TFT (CD DD DC). TFT also falls into a punishment cycle when playing itself (CD DC). The average payoff per round for both players in the TFT/WSLS punishment cycle is (b - c)/3, whereas the average payoff for both players in the TFT/TFT punishment cycle is (b - c)/2. Thus in the presence of errors, TFT against TFT does better than either TFT against WSLS or WSLS against TFT, but WSLS against WSLS does best of all. Can clusters of WSLS invade TFT in a reciprocal dilemma with errors?
Cooperation on the basis of learned tags appears to be vulnerable to mimics: in a spatial model where each agent copies the tag and strategy of its most successful neighbour, a mimic copies the tag but not the strategy. The successful neighbour has earned either the temptation payoff or the reward payoff, which means its tag has persuaded other agents to cooperate with it. By copying the tag and then defecting, a mimic can earn the temptation payoff from members of the victim's cooperative group.
Mimics destroy cooperative groups in the same way as mutant defectors, but more quickly: destruction starts as soon as any member of the group encounters a mimic, rather than having to wait for a member's strategy to mutate. Long tags protect against mutant defectors by raising the tag mutation rate above the strategy mutation rate, allowing new cooperative groups to evolve more quickly than old cooperative groups are destroyed [MS]. However, long tags provide no protection against mimics.
When a mimic earns the temptation payoff, its non-mimic neighbours will copy its tag and its strategy (defection), and defect against their own neighbours in the next round. Defection will spread through the group following any member's encounter with a mimic.
It's not clear whether the same principle could be applied to a reproductive model as opposed to a learning model.
Evolutionary models in which agents use arbitrary tags to decide with whom to cooperate, or imitate successful neighbours in a lattice or other graph, or leave offspring at or near the parent's location in a lattice or other graph, are essentially models of kin selection [SN]. Cooperation emerges because it is likely that agents will play against other agents with the same strategy. Compared to random mixing this reduces the probability of the CD and DC outcomes in the prisoner's dilemma and increases the probability of the CC and DD outcomes, no matter what the mixture of strategies. Since CC earns a higher payoff than DD, cooperators reproduce more rapidly than defectors.
Oscillations between cooperation and defection may occur if the payoff for DC is much higher than the payoff for CC (enough to offset the increased probability that a defector attempting to play DC will be paired with another defector and earn DD).
Skyrms [Skyr] generalises this phenomenon using the concept of conditional proportions, which are the proportions of different strategies an agent can expect to interact with given its own strategy.
Consider the set of two-player two-action general-sum games, where the order of the payoffs is the same for both players (ordinally symmetric). If one of the symmetric outcomes has a higher payoff than the other then label that action C (cooperation) and the other action D (defection).
CC is a strict Nash equilibrium when CC > DC and a weak Nash equilibrium when CC = DC. Cooperation is strictly dominant when CC > DC and CD > DD, and weakly dominant when either CC = DC and CD > DD, or CC > DC and CD = DD.
Defection is strictly dominant in only one game, the prisoner's dilemma: DC > CC > DD > CD. DD is a strict Nash equilibrium in this game. Defection is weakly dominant in two limiting cases of the prisoner's dilemma: DC > CC > DD = CD, where DD is a weak Nash equilibrium, and DC = CC > DD > CD, where DD is a strict Nash equilibrium. DD is not Pareto optimal in any of these games.
There is another limiting case of the prisoner's dilemma in which neither cooperation nor defection is dominant, and any combination of strategies except CD is a weak Nash equilibrium:
DC = CC > DD = CD
There are six mixed-motive games in which neither cooperation nor defection is dominant. Three are coordination games in which CC is a Pareto optimal strict Nash equilibrium:
CC > DD > CD ≥ DC
CC > DD > DC ≥ CD
CC > DC ≥ DD > CD (stag hunt)
CC is not a Nash equilibrium in the other three mixed-motive games:
DC > CC ≥ CD > DD (chicken)
CD ≥ DC > CC > DD (catch)
DC > CD ≥ CC > DD (lookout)
CC is Pareto optimal in chicken but not in catch or lookout, which might be called discoordination games.
When neither of the symmetric outcomes is preferable to the other, the actions can be labelled A and B instead of C and D. There are two such games in which neither A nor B is dominant:
AA = BB > AB = BA (coordination)
AB = BA > AA = BB (discoordination)
A linear status attribute (such as age [Mayn]) can be used to break symmetry in order to settle disputes without fighting. However, if players make mistakes when assessing each other's status then occasional fights will still occur. Whether it is worth risking a fight to avoid sharing, or vice versa, depends on the difference between the payoffs for DC and CC on one hand, and between CD and DD on the other. This should determine the direction in which players prefer to err when assessing each other's status.
Basing your probability of cooperation on the fraction of times the other player has cooperated places greater emphasis on early rounds (because of echo effects), which could encourage cooperative play early in the game. But if players are willing to take the risk of cooperating early in the game in order to obtain long-term benefits, will this benefit short-term exploiters?
"I am attempting to show how morality can be instrumentally efficient; how some mutually beneficial constraining principles are the best means to a player's ends. Following David Gauthier, I want to argue for the rationality of indirect choice, where principles constraining a player's immediate choices are his best means in some situations." (p. 61)
"Responsive moral agents need to solve a pair of problems. On the one hand, they need to commit themselves to constraint in a way that other players can determine. On the other hand, they need to determine other players' principles, in order to discriminate among them." (p. 155)
"Since iterated games can be solved by straightforwardly rational agents, they are not morally significant problems." (p. 45)
"TFT in the IPD is interesting because it often achieves joint cooperation without moral constraint ... the responsiveness of other players creates the environment in which Tit For Tat is rational." (p. 49) This sounds like a description of evolutionary stability, and yet "Our agents should care about scoring, but scoring need not be connected to reproduction" (p. 43)
Extending TFT to the multi-player PD: R. Hardin, "Individual Sanctions, Collective Benefits", in R. Campbell and L. Sowden (eds), "Paradoxes of Rationality and Cooperation", University of British Columbia Press, 1985.
"Theories that assume subjective preferences plus common knowledge must also assume some other interaction where subjective preference information is revealed. Preference revelation is non-trivial for mixed-motive games." (p. 38) On the other hand, "I simply stipulate a world in which some sorts of deception (claiming falsely that one is committed to CC, for example) are impossible." (p. 77)
"Conditional cooperators, although defending themselves directly against predators and free-riders, fail to defend themselves indirectly, by sustaining the unconditional cooperators whose presence enables straightforward maximizers to thrive." (David Gauthier, quoted on p. 94)
"Conditional cooperation, like Tit For Tat, teaches simple learners to become unconditional, not conditional cooperators since [Conditional Cooperation] punishes other players trying out the use of D." (p. 101)
"On the one hand, knowing that a particular player is a threatener seems to be in the interests of a straightforward maximizer. On the other hand, being unable to know that the other player is threatening is a form of threat resistance, which may be beneficial." (p. 81)
Chicken has the payoff structure DC > CC > CD > DD. Unlike the prisoner's dilemma, defection is not the best response to defection - it is costly to punish defectors. "In the PD, morality seemed to push us towards the broader cooperation of [Conditional Cooperation] and rationality to the narrower cooperation of [Reciprocal Cooperation]. In chicken, rationality indicates broader and morality narrower cooperation." (p. 171)
"Let us call a person who is disposed to cooperate in ways that, followed by all, yield nearly optimal and fair outcomes, narrowly compliant. And let us call a person who is disposed to cooperate in ways that, followed by all, merely yield her some benefit in relation to universal non-cooperation, broadly compliant. We need not deny that a broadly compliant person would expect to benefit in some situations in which a narrowly compliant person could not. But in many other situations a broadly compliant person must expect to lose by her disposition. For in so far as she is known to be broadly compliant, others will have every reason to maximize their utilities at her expense, by offering 'cooperation' on terms that offer her but little more than she could expect from non-cooperation." (David Gauthier, quoted on p. 172) The phrase "in so far as she is known to be broadly compliant" reveals the crucial assumption of transparency. Even without transparency, an auction can achieve an optimal outcome for a broadly compliant seller by pitting utility-maximising buyers against one another.
"It is crucial to morals by agreement that constrained maximizers should only comply with fair and optimal bargains, a disposition which Gauthier calls narrow compliance." (p. 172)
Narrow compliance requires refusing to cooperate with defectors in chicken. "The public good of morality benefits both narrow compliers and broad compliers, but the latter benefit more, since they avoid some enforcement costs." (p. 173) The game of chicken has been transformed into a free-riding problem, ie a prisoner's dilemma, at a higher level.
Standard conditions for an evolutionarily stable strategy: either it does better against itself than any other strategy does against it, or it does equally well against itself and does better against the other strategy than the other strategy does against itself. Only applies to an infinite population with asexual inheritance and pairwise contests. (In particular, this rules out the sharer's dilemma and any other game in which an agent's pairwise interactions are not separable - thus there can be no ESS for such games.)
Bishop-Cannings theorem: if a mixed strategy is an ESS, the expected payoffs to the pure strategies comprising it must be equal.
If more than two pure strategies are available, it is possible for a mixed strategy to be stable while the corresponding polymorphism (mixed population of pure strategies) is unstable, or vice versa.
Asymmetric roles can help to establish an ESS by convention even if they don't alter the payoffs.
In a structured population, one way to find an ESS is to look for a strategy I such that, if all the neighbours of an individual are adopting I, the best strategy for the individual is also I (relates to the definition of a Nash equilibrium, and also applies to newcomers in a dynamic population, even if the population is unstructured).
m.rogers@cs.ucl.ac.uk
Last modified 2007/02/25