the plan for the future

Model Thinking

26 March 2016 - 2 June 2016

Lecturer: Scott Page

Part II

11. Lyapunov Functions

11.1. Lyapunov Functions

Lyapunov functions map models into outcomes. We can take a model or a system and ask whether there is a Lyapunov function that describes that model or system. If that is possible, then that system goes to equilibrium. If we can't construct a Lyapunov function, then the type of outcome can be of any class. For example, in physics you might have a velocity function over time, and if the velocity changes then it goes down with at least some minimum amount, and there is a minimum velocity, for example zero. Those conditions mean that the system has to stop at some point.

This can be formalised as follows. There is a Lyapunov function F(x), and the system has to stop and go to equilibrium at some point meaning that xt+1 = xt, if the following conditions hold:
- (1) it has a maximum value (or a minimum value if it is going down);
- (2) there is a k > 0 such that, if xt+1 ≠ xt then F(xt+1) > F(xt) + k (or F(xt+1) < F(xt) - k if it is going down).

The change must be of some minimum amount k because of Zeno's paradox that states if you go halfway the remaining route every day, you will never arrive at your destination as you travel smaller distances every day. If you know the minimum amount k then you know the maximum number of steps. For example, if k is a quarter of the total distance then the maximum number of steps is 4. The tricky part is construction the Lyapunov function F(x). Sometimes it is easy, sometimes it is hard, and sometimes it is impossible to do.

11.2. The organisation of cities

In any major city there is an amazing order. Restaurants have the right number of people in them, so do coffee shops. There are not huge lines behind dry cleaners. The interesting thing is that there is no central planner and the city self-organises in some way so that the right number of people are at the right places and there are no vacancies or crowds at particular points. What makes the city organise in this way?

Suppose that there are five locations everyone has to go to each week, which are the cleaners C, the grocery G, the deli D, the book store B and the fish market F, and there are five days to do them, which are Monday Mo, Tuesday Tu, Wednesday We, Thursday Th and Friday Fr. Assume that individuals choose their order of visiting those places randomly. A route that someone might take during the week is: (Mo, Tu, We, Th, Fr) => (C, G, B, D, F). Suppose there are five people that choose some random order to visit these locations. Assume that the behaviour of these people is to switch two locations as to avoid crowds.

If people follow these rules, then we can use the Lyapunov function on the process, and show that it goes to equilibrium. Assume those five people picked the following routes: 1. (C, G, D, B, F), 2. (G, C, D, B, F), 3. (C, D, G, F, B), 4. (C, B, F, G, D), 5. (C, F, D, B, G). Each week one person switches locations to avoid other people, and takes the most efficient step, meaning that he or she makes the switch that avoids the most people. After one week person 1 might then switch C and F, meaning that the next week he is going to the fish market on Monday and to the cleaners on Friday to avoid crowds at the cleaners.

Finding a Lyapunov function might be a form of trial and error. For example, you may try the total number of people at each location per week. That doesn't work out because the total number of people will always be 5. Another option is the number of people that the five people meet each week. In the first week person 1 meets 3 + 0 + 2 + 2 + 1 = 8 people. If this person switches the cleaners and the fish market, he will meet 4 people next week because the others don't switch. Consequently, those others don't meet person 1 at the cleaners too, so the total number of meetings in that week drops by 8.

This is a Lyapunov function because there is a minimum of 0 meetings during a week and people keep switching until there are no options to reduce the number of meetings. When that happens, the system enters into equilibrium. The value of k is 2 because if a person avoids meeting another person, this other person also meets one less person. This explains why cities are self-organising because people develop routines to avoid crowds. This model is simplistic, because people move, businesses start and stop, and more people may decide to change their route simultaneously. That is going to keep a city churning and somewhat complex.

11.3. Exchange economies and externalities

An exchange market has a Lyapunov function and therefore goes to equilibrium. There are other sorts of markets that don't go to equilibrium. It is sometimes possible to see why is that the case. What prevents us from constructing a really simple Lyapunov function and showing that the system goes to equilibrium is related to Chris Langton's lambda used in the one-dimensional cellular automata models.

An exchange market consists of a situation where people just bring stuff, for example fish, baskets or money, and trade these things. The question is whether that system goes to an equilibrium, or are people just going to keep on trading things throughout the day? The assumptions for the model of the exchange market are:
- (1) each person brings in an amount of stuff;
- (2) people only trade if this increases their happiness at least with some fixed amount k.

This fixed amount of k represents effort or transaction costs to make the trade. That is important for the Lyapunov function to work. A possible Lyapunov function could be the total happiness of the people. There is a maximum amount of happiness to be derived from a fixed amount of stuff. There is also some fixed amount k of transaction costs such that, if the process doesn't stop, happiness goes up by at least k.

If North Korea and Iraq exchange nuclear weapons for oil, then both countries would probably be more happy as Iraq has nuclear weapons and North Korea has oil. The United States, China, and many other countries, who were not part of the transaction, probably are not so pleased, so total happiness may not have increased with this trade. If total happiness went down, that may mean that other people then have to make other trades as they try and make total happiness go up.

In that case we don't know for sure whether the system is ever going to stop because we can't put a Lyapunov function on the process. It may happen in political coalitions or firms merging. When party A merges with Party B, then party C may be upset, and total happiness may not be going up. The same is true with political alliances between countries. They could make other countries less secure. And that could mean that there is no Lyapunov function.

When two people decide to date, they are both happier. Or when two people break up, presumably they are both happier. But that could affect other people who are friends of those people, who maybe wanted to date one of those people, and it's not clear. Maybe dating has a Lyapunov function, maybe happiness is a Lyapunov function for dating, maybe it's not. It depends on the size of the externalities.

That can be related to Langton's lambda parameter from the simple cellular automata model. The cellular automata model tells that systems where behaviour isn't influenced by others, tend to go to equilibrium, while systems where behaviour is influenced by others, tend to be complex or random. Similarly, we can apply a Lyapunov function where someone's actions don't materially affect others, or if they do, they make them happier, meaning that there are no negative externalities.

11.4. Time to convergence and optimality

There are two questions to Lyapunov functions:
- (1) How long does the process take to go to an equilibrium?
- (2) Does the process always stop at the minimum or the maximum?

How long does the process take to go to an equilibrium? If the process stops, it is in equilibrium. If it doesn't stop, then its value according to increases by at least k. Since there's a maximum, that means that at some point the process has to stop. Suppose that we start out with F(x1) = 100, and k = 2, and the maximum is 200 then the number of periods has to be equal or less than 50.

Does the process always stop at the minimum or the maximum? Generally a process can get stuck someplace less than the max. That can be explained in two ways. First, it is possible using a model. The rugged landscape model has multiple peaks. A Lyapunov function can be seen as taking steps up at least some distance each period, but it doesn't necessarily mean that this is leading to the highest peak.

A second way is an actual example rather than an abstract model like a rugged landscape. Assume there are three persons, numbered 1, 2, and 3. The graph shows their preferences and they all own the item in the middle column of their corresponding row. Person 1 owns a banana prefers apples to bananas and bananas to coconuts.

Assume that there is an exchange market. Person 1 likes the apple of person 3, but if he offers the banana to person 3, she refuses. Person 2 likes the banana of person 1 but can't trade the coconut for it. Similarly, person 3 cannot trade his apple for the coconut of person 2. None of them can make a pair wise trade and be better off.

11.5. Lyapunov: fun and deep

Lyapunov functions are one technique for determining whether or not a system goes to equilibrium. But can we always tell, possibly with the help of other techniques? The question is does the system go to equilibrium or does it not, and can we even tell? We are going to do this in a fun way with some examples and then we will go a little bit deeper. We will see why some processes are very hard to figure out.

The fun example is called chairs and offices. A firm is moving to new offices with different types of chairs. An employee, who followed this course, suggests to distribute furniture by randomly assigning each person a chair and then let people trade. The boss thinks that it is not a good idea. But the employee says that at some point the process will stop because the process consists of an exchange market and a Lyapunov function of how happy people are with their chairs. With every trade happiness goes up. People will stop trading if they are satisfied, because it takes time and effort to trade.

Then the boss also wants to do this for offices and randomly assign people offices and then let them trade. Then the employee said that is a terrible idea. The student says, the office is different because there are externalities. If a person moves because of the habits of another person, for example playing loud music, people in the new location may move because of this person's habits, for example wandering around. And so, there will be no increase in total happiness after a move, and the system may not go to equilibrium. Finding this out requires a lot of knowledge about people's preferences.

That leads to the deep question of when can you decide? And can you always decide? The answer is it depends on the problem. In some cases you can figure out some other way to show or prove that the process goes to equilibrium. In some other cases you can come up with a sophisticated Lyapunov function to show that the system goes to equilibrium. But other problems, even those that seem incredibly simple, turn out to be very hard to solve.

For example, the Collatz problem or HOTPO (half or three plus one), is that you pick a number, and then do the following. If it is an even number, you divide it by two, if it is an odd number, you multiply it by three and add one. You stop if you reach one. The question is, does this process ever stop? For 5 you get: 16, 8, 4, 2, 1 (stop). For 7 you get: 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5 (stops because 5 stops). For 27 and many other numbers the Collatz problem hasn't been solved yet. So for some numbers we can tell and for some numbers we can't. For the chair problem we can tell, but for the office problem we can't.

Lyapunov functions
Lyapunov functions
Markov processes
Markov processes

11.6. Lyapunov or Markov

There are some fundamental differences between the equilibria with Lyapunov functions and equilibria with Markov processes. Both Markov processes and Lyapunov functions give conditions under which we can determine that a system is going to equilibrium.

A Lyapunov function F has a maximum value. If the process isn't in equilibrium then it goes up by at least some fixed amount k. This has to stop at some point. A Markov process has a finite number of states, and if the probability of moving between those states stays fixed over time, and it is possible to get from any one state to any other, and it is not a simple cycle, the Markov convergence theorem states that the system goes to a unique equilibrium distribution that doesn't depend on the initial state.

This is a stochastic equilibrium so the system is still churning. In a Markov process, history doesn't matter. A Lyapunov function could depend a lot on the initial conditions. There could be many equilibria depending on where you start and where you go. It's also not a stochastic equilibrium. It is a fixed point.

If you can construct a Lyapunov function, then the system goes to equilibrium. If you can't, that doesn't mean it doesn't. If you can write down a Lyapunov function, you can figure out how long it's going to take to reach equilibrium. A Lyapunov equilibrium needs not be unique or efficient. The reason a Lyapunov system won't go equilibrium is because of externalities pointing in the other direction.

That is the same lesson we learned from the Langton's lambda in the simple cellular automata model. When one person's action or one cell's action depends on the actions of others, the system's likely to churn. When what I do is unaffected by other people, then the system is likely to go to equilibrium.

12. Coordination and Culture

12.1. Coordination and culture

Coordination is doing the same somebody else is doing. Culture refers to differences between groups of interacting people. In order to have these differences between groups of people, there have to be similarities within those groups. That is where coordination matters. Cultural behaviour can be suboptimal in the sense of efficiency. Often it doesn't make sense from the outsider's perspective, but viewed from within that culture, it makes sense.

The pure coordination game that enables us to understand why people do the same thing, and why sometimes they do the same thing that is not optimal. In the context of this game, an important question is does the whole system coordinate if people are trying to coordinate? To understand that, the Lyapunov function can be used on a single coordination game on one behaviour.

It is also possible to have a many-action coordination game on multiple behaviours. This can be used to explain cultural differences. In particular, Bob Axelrod made a model that looks at how culture emerges, and why you might get thick boundaries between different cultures. There is also inconsistency within cultures. The coordination consistency model deals with both coordination between people and consistency within a person with the use of Markov processes.

12.2. What is culture and why do we care?

There are many definitions of culture. Culture is such a complex thing that no simple definition can capture it. Still, culture mattes a lot for the success of countries. Tylor, who is the father of modern cultural anthropology, said in 1871 that culture is the complex whole which includes knowledge, belief, art, law, morals and customs. Boas extended this in 1911 by calling culture the totality of mental and physical reactions and activities that characterise behavioural responses to environment, others and to the self.

Boas was the first to get the idea that there should be some consistency across these mental and physical reactions to the world. This consistency could be partly brought on with the environment or it could be socially constructed. These dry academic definitions don't show exactly what culture is. Writers have also written definitions of culture, for example Calvin Trilling wrote in 1955:

When we look at people in the degree of abstraction which the idea of culture implies, we cannot but be touched and impressed by what we see, we cannot help but be awed by something mysterious at work, some creative power that seems to transcend any particular act or habit or quality that may be observed. To make a coherent life, to confront the terrors of the outer and the inner world, to establish the ritual and art, the pieties and duties which make possible the life of the group and the individual - these are culture, and to contemplate these various enterprises which constitute a culture is inevitably moving.

To make a coherent life refers to the idea of consistency. An essential reason for the existence of culture is to made possible the life of the group and the individual. In order for groups and individuals to function, they have to have some similarities within. They have to agree or coordinate on certain sets of behaviours, morals, laws, customs, and understandings, in order to confront the inner and outer world. The interesting thing about cultures is that people do this in different ways.

Culture dimensions
Culture dimensions
What could these differences be? And how extreme can these differences be? One way of figuring this out is doing experiments. The ultimatum game is an experiment that was done in different societies. Player 1 is given $10. Then player 1 is told to offer a split to player 2. The split might vary, for example $5 apiece, $2 for player 2 or even $0.01 for player 2. Player 2 can either accept or reject. If player 2 accepts the money, then they split it. If he rejects, they both get nothing.

Player 1 has to figure out what's the minimum amount he can offer player 2 to get it accepted. People from different cultures didn't play the game in the same way. One group were the Lamalera, who are Indonesian whale hunters that hunt collectively. Another group, the Machigenga, who are an Amazonian group that hunted individually and didn't even use names, was much more selfish. On average the Lamalera offered $5.70, while the Machigenga offered $2.60.

Culture of individuals in Sweden
Culture of individuals in Sweden
Culture of individuals in Zimbabwe
Culture of individuals in Zimbabwe
Ron Inglehart, who spent decades surveying people all over the world doing something called the World Value Survey, constructed a graph by ranking different cultures on two dimensions, which are survival versus self-expression and traditional versus secular-rational. If you divide countries in these two ways, then the whole world makes sense by geography. For example, protestant Europe is in the self-expressive secular realm while Islamic and African countries are more in the religious and survival-based area.

This map might suggest that all people from Sweden or Zimbabwe score the same. That is not true. Each country has a population and within this population there are differences. The point on the graph is just an average. The individual data points are clouds. But there is very little overlap between the scores of individuals in Sweden and Zimbabwe, so there is a difference between people from Sweden and Zimbabwe, but also differences within the Swedish people and within the people from Zimbabwe.

Geert Hofstede, who focused on the business world, uses five dimensions to make sense of cultures, which are: (1) power distance or how much inequality is tolerated, (2) uncertainty avoidance versus risk taking, (3) individualism versus collectivism, (4) masculinity versus feminity, and (5) Confucianism versus dynamism or how forward looking are you?

The US scores are: power distance 32 (a low value signalling a lot of inequality tolerance), uncertainty avoidance 40 (low), individualism 90 (very high), and masculinity 60 (high). The scores of France are: power distance 61 (a high value signalling little inequality tolerance), uncertainty avoidance 80 (very high), individualism 63 (high), and masculinity 32 (low). Those dimensions are useful because they capture differences. But those dimensions don't capture everything. For example, South Korea and El Salvador score practically the same on Hofstede's dimension, but they are completely different countries.

Why should we care about culture? This is because the economy, political systems and society work through social exchanges. According to Kenneth Arrow:

Virtually every commercial transaction has within itself an element of trust, certainly any transaction conducted over a period of time. It can be plausibly argued that much of the economic backwardness in the world can be explained by the lack of mutual confidence.

Wealth versus trust
Wealth versus trust
In cultures there are different levels of trust, and these different levels of trust have huge implications for how well political, economic, social, and religious institutions perform in terms of meeting the needs of the citizens. Bob Solow, who invented the growth model, wanted these notions of trust and social capital to be measurable in some way. That is one of the reasons why we have models like those of Inglehart and Hofstede.

How do you measure something like trust? This can be done using questions like do you claim government benefits you are not entitled to? Do you avoid paying a fare on public transport? Do you cheat on your taxes? Do you keep money if you found it? Do you fail to report damage you've done accidentally to a parked vehicle? These sort of questions indicate how trusting are people within a society.

You can also ask the question, do you think people in general can be trusted? There are massive differences across countries. In Sweden 70% of people answer this question positively. In Italy, it is 33%, and in Turkey it is only 10%. That is going to affect how well the economy is going to work.

It does matter a lot, because in general, rich countries have a higher level of trust. There is a correlation between trust and wealth, but that doesn't imply that trust causes higher levels of wealth even though that seems plausible. It might also be that higher levels of wealth make people more trustworthy.

12.3. Pure coordination game

To make sense of cultural differences, a pure coordination game can be used. Some examples help to understand the pure coordination game. The ketchup question is do you store your ketchup in the fridge or the cupboard? Most Americans store their ketchup in the fridge, but people from England tend to store it in the cupboard. Where you store your ketchup is a cultural decision. You store it in the same place other people store their ketchup. This is an example of a coordination game. It is a situation where you do the same thing other people do, otherwise they can't find the ketchup.
Dagen H logo
Dagen H logo

Other examples are electrical outlet plugs. Different countries have different plug configurations. Everybody in the country wants to coordinate on the same plug type, but different countries may coordinate on different types of plugs. The ramifications of failure to coordinate can be rather severe.

On the third of September in 1967 (Dagen H) at 4:00 AM, Sweden switched the side of the road they drove on from left to right. People from Sweden drove to other countries and caused accidents. It is also easier to make cars with steering wheels on the same side. England and some former British colonies still drive on the left. Those countries are mostly islands. On an island, you don't have to worry about crossing a border and causing an accident. That is a coordination game.

Pure coordination game
Pure coordination game
This can be turned into a formal game. A game is a set of players with actions and payoffs. In the pure coordination game there are two players, and each player has one action. For example, they can put the ketchup in the fridge or they can put it in the cupboard. If they both put their ketchup in the fridge, both get a payoff of one. If they both put their ketchup in the cupboard, both also get a payoff of one. But if one of them one puts it in the cupboard, and the other one puts it in the fridge, then each player gets zero because neither one can find the ketchup.
N-person coordination game
N-person coordination game

The game can be expanded to N persons that have to decide about something like driving left or right or whether the ketchup should be stored in the fridge or in the cupboard. Assume that they play this game against their two neighbours and switch their behaviour if both neighbours behave differently. Because people want to coordinate their behaviour to match other people they change their behaviour to create the similarities within. After three steps you also see differences between groups emerging.

Does this game actually lead to similarities? Does this process stop or does it keep on churning? The answer can be found using the Lyapunov functions. If the process can be represented by a function that has a maximum, and it goes up by at least some fixed amount each time if it moves, then the process is going to stop at some point.

If this pure coordination game is played asymmetrically, that means having only one person moving at a time, which is unlike the example in the graph, then you can show that it satisfies the Lyapunov function of the number of coordinations. For example, if someone switches to blue, the number of coordinations goes up by two. There is also a maximum to the number of coordinations, meaning that everyone has the same colour, so the process has to stop. The process doesn't have to stop with everybody being the same as you can get a boundary between the colours.

Inefficient coordination
Inefficient coordination
There is a difference between coordination and the standing ovation problem. In the coordination model, for example choosing the side of the road to drive on, there is a measurable difference in payoffs so no-one would choose not to coordinate. In the standing ovation model it is more a psychological effect. It is no problem to differ from other people.

Inefficient coordination refers to behaviour that doesn't seem optimal. To make sense why this happens, you can use the inefficient coordination game. It may apply to implementing the metric system M versus the obscure English system of measurements D. If both parties agree on M, they get a payoff of 2. If both agree on D, they get a payoff of 1. If they differ, they get a payoff of 0. If both parties were free to choose they would agree on M, but in England people are stuck with D.

How do you get inefficiency? The metric system wasn't around when the English measurements were made up. Situations can shift. This can be explained with the shake-bow game. Greeting people is a coordination game. It is important that we greet people the same way. If one shakes and the other bows, eyes get poked out. If both shake hand, they get a payoff of 1. If one shakes and one bows, we both get 0. If both bow, bot get a payoff of a.

The question is: which is better, bowing a > 1 or shaking a < 1? Shaking could be better than bowing because people can get a sense of the other's general health by feeling. However, recently the value of a has increased because people now travel all over the world in airplanes with germs over the seats and touching these seats spreads germs all over the hands. And then bowing is better than shaking because the world has changed. As a result, countries that are shaking could get stuck in a suboptimal way of greeting, just like England is stuck in a suboptimal system of measurements.

12.4. Emergence of culture

To understand why cultures differ, the pure coordination game is extended to multiple games. Culture is the result of lots of coordination games. Here are some example questions. Do people wear shoes in your house? Do you cross the street when the "don't walk" sign is flashing? Do you read the newspaper at the breakfast table or are you talking to people? Do you hug your friends when you see them? And do you interrupt someone who's talking?

Each of these situations is a pure coordination game. If everybody else is wearing shoes in the house, you don't want to take your shoes off cause you will get dirt all over your feet. If you are walking with friends, and they don't cross the street when the sign is flashing and you do, then they will be standing on one side of the street and you will be standing on the other. If you read the newspaper at the breakfast table and the other person is hoping to engage in conversation, he is going to stare at your newspaper. If you're in a culture where people don't interrupt and somebody interrupts you, this disrupts the flow of the conversation.

People that give the same answers to these questions are culturally similar. If there are 20 coordination games with two possible answers to each, then you have 220 or more than 1,000,000 possible cultures. There are of course more coordination games, and many of them have more than two answers, so there are many more possible cultures. Why do cultures differ? The paradoxical answer is they differ because everyone is trying to coordinate. People are trying to coordinate within a culture most likely on issues that are different than those in others cultures.
Axelrod's definitions
Axelrod's definitions

Axelrod's culture model explains how cultures emerge, and why there are boundaries between cultures. Axelrod defines a set of features which are coordination games like whether you read the newspaper at the table or talk. Traits define what action you take on that feature, for example talk. There could be multiple options. A person is a vector of traits on features. For instance, person A wears shoes in the house, stores ketchup in the fridge, and reads the newspaper at the table.

Social space
Social space
Axelrod then puts people in social space that could be represented by a grid of boxes that represent people. The last assumption Axelrod makes, is that people interact with their four direct neighbours in the grid with a probability that is equal to the similarity between them. This interaction works like a coordination game.

If you don't agree on much with your neighbour then you don't interact and don't change your behaviour. If you agree on everything then you interact for sure, but then you don't have to change anything. If you agree for 70% of the traits, then you randomly pick a feature, and then match the trait of that person. It could be you already match that person on that trait, but if you don't then you will switch and match.
Axelrod's game
Axelrod's game

This can be worked out using a model. Suppose there are five features, ten values, four neighbours (North, South, East, West), and the similarity is the percentage of traits that a person agrees with the another person. When two people meet, one of them is a leader while the other is a follower. For example, the leader may have the characteristics 53211 and the follower 51331.

The probability these two people are going to interact is 40% because they agree on two traits out of five. If they decide to interact, then then they decide play a coordination game on one of the traits, for example the second trait. The follower then matches this trait to the leader so that his characteristics become 53311.
Start of game
Start of game
End of game
End of game

This can be simulated using NetLogo. In this example there are five features and twelve traits. In the centre of each square is a person. The thickness of the boundary represents the number of treats those two people agree on and thus the likelihood they will interact. If that line is dark, there is little agreement, and if it's lighter, there is more agreement and a higher probability of interaction.

Over time the number of regions are falling and the cultures are becoming more similar. The largest region is also growing but there is still some cultural heterogeneity. And if we let it run till it stops, you get a few separate cultures and there are thick boundaries around them. They have to have these thick boundaries because if they didn't, people would interact with their neighbours across those boundaries and become similar.

Axelrod's model shows that similar regions emerge, so that you get multiple cultures, but the boundaries between those cultures are thick. Axelrod then concludes that if our neighbours are like us, we tend to interact with them, and if they are not, we tend not to. This produces distinct cultures with thick boundaries or vast differences between these cultures. Thick boundaries emerge because of the fact that if there weren't thick, then people would interact and become more similar so that the boundary would disappear. Axelrod's model shows how distinct cultures on multi dimensions can emerge with self-reinforcing boundaries.

12.5. Coordination and consistency

Axelrod's model leaves out the notion of consistency within a culture. Jenna Bednar developed a model of culture based on coordination games that includes coherence and consistency as well as the heterogeneity within cultures. In Ron Inglehart's World Value Survey, there are a lot of consistencies across countries, for example Protestant Europe looks the same, Catholic Europe looks the same and the Islamic countries look the same. On the other hand, everybody hasn't the same characteristics. For example, people in Sweden differ. And the same is true for Zimbabwe.

To explain why there is consistency and heterogeneity within a culture we assume that:
- (1) the values, actions, and behaviours that people coordinate on have some meaning;
- (2) people desire some consistency, for example to avoid cognitive dissonance;
- (3) there is some innovation and errors.

The coordination rule is that two people, that each have a vector of actions, beliefs or attitudes, meet, and the follower randomly picks a dimension, and coordinates on this dimension, for example he puts the ketchup where his friend puts the ketchup. Consistency would be this that a person looks at himself, and attribute values of the vector have meaning. For example, a person with characteristics 5144 might think, I'm 5 on the first and 1 on the second, and that doesn't make sense, so I switch and become 5 on the second. And so the characteristics of this person become 5544.

For example, you come from a family that is pretty reserved, so you don't even hug your parents very often, and have hugging behaviour 1. Then you go to college and all your friends hug each other. Now you have hugging behaviour 5 with your friends. When you go home you realise you have been hugging someone you just met a month ago to say good-bye so you start hugging your mother and change your behaviour to 5. If you hug your friends, you might as well hug your mother. In this way you become more consistent in terms of how you behave.
Innovation spreading

If we assume that people try to coordinate with other people, and try to be consistent, it could be expected to lead to consistent coordinated behaviour. It happened but the process took an incredibly long time to converge. When some tiny errors or innovations were introduced, then there were big spreads in the characteristics of individuals like those in Sweden and Zimbabwe. Only a small number of people have to try new things, to get that level of heterogeneity within a community.

That was a surprising result. The question is why? Suppose the model converges to everybody being 5 on everything and somebody decides to innovate a bit and change to 6 on one thing. We might think that this should go away because people try to be consistent with themselves or meet somebody else so that it gets corrected to 5.

Another possibility is that during an interaction somebody else copies the 6 or that people themselves switch other attributes to 6 in order to be more consistent. In this way the six can spread within a person (horizontally in the graph) and across people (vertically in the graph). If more of such innovations or errors are made, for example switching to 7, these can also spread. This means the process isn't going to converge very fast. There is a lot of spread because errors propagate in many directions. Most traits are going to be fives, but there will be sixes and sevens everywhere.

It is possible to mathematically understand why small amounts of error can cause a big spread. The simplest model consists of two agents, two games, and two actions. For example two people are deciding whether to hug or bow with their family, or hug or bow with their friends. Now it is possible to write down all possible states of the world. The columns represent the games. The rows represent the persons.
States of the world
States of the world

The following could happen:
- (1) both do the same thing on both games, for example action red;
- (2) one person is taking one action on both games, but the other person is taking the other action one game, which is called off by one;
- (3) they could be consistent but not coordinated, for example one person is green on both, and the other person is red on both;
- (4) they could be coordinated but not consistent, for example they could both play the red action on one game and the green action on the other game, but they would have a lot of cognitive dissonance because neither one is consistent;
- (5) they could be not coordinated and not consistent, which could be a total mess.
Consistency dynamic
Consistency dynamic

If this system is without an error, the dynamics can be described. For example, in the state where people are off by one, the consistency dynamic is as follows:
- (1) person 2 could look at himself, check for consistency, and conclude that he is consistent, and nothing would change;
- (2) person 1 could look at herself, check for consistency, and conclude that she is not consistent, and switch both actions to red. Then both would be consistent, but neither would be coordinated;
- (3) person 2 could look at herself, check for consistency, and conclude that she is not consistent, and switch both actions to green. If person 1 finds herself to be inconsistent, then there is a 50% chance for (2) to happen as well as a 50% chance for (3) to happen.
Transition map
Transition map

It is also possible to write down the coordination dynamic for all the different states including the probability of moving from those states to other states. For example, if there is a total mess (top), and one of the persons looks at himself or herself and decides to become consistent, then they move to the off by one state (centre). Alternatively, suppose that two people may decide to coordinate. Then again, they move to the off by one state. No matter what happens, with probability they move from the total mess state to the off by one state.

In the off by one state (centre), a lot could happen. They could stay in the off by one state, for example when person 2 looks at himself and decides that he is consistent, or if these two people decide to coordinate on action 1. Alternatively, it would be possible to move over to the state consistent but not coordinated (right) or coordinated but not consistent (bottom). Finally, they could move to the coordinated and consistent state (left). This is the only state that is stable because they will not move out.
Markov process
Markov process

This is like a Markov process, except that they can't get from any state to any other state because they can get stuck in the equilibrium coordinated and consistent state. If there is innovation or error ε then they can move back to the off by one state and then it is a Markov process.

In this way it becomes possible to write a matrix we with the horizontal axis representing the states at time t and the vertical axis representing the states at time t+1. The cells represent the odds of moving from one state to the other. This model shows that small innovation rates lead to substantial heterogeneity. This was a big surprise.

There are differences between cultures, so people from France behave differently than people from Mexico. There are also see similarities within, meaning that people within an interacting group become similar. Their behaviour isn't identical and there is a lot of within-group heterogeneity, for example in Sweden or Zimbabwe. Some of that behaviour is interesting in the sense that it doesn't appear to be optimal from the outside. For example, people might do 5 to become consistent, even though it is not optimal.

There are three ways people can coordinate on a wrong action:
- (1) they could just idiosyncratically coordinate on the wrong thing;
- (2) payoffs can change over time, for example with bowing versus shaking hands;
- (3) in order to maintain consistency, people could choose a behaviour that is not optimal in one domain.

Culture can be seen as multiple coordination games, where people are trying be consistent. To get good outcomes, it's important to coordinate behaviour, for example to greet people the same way. There is still a lot of within-group heterogeneity that could be explained by assuming that there is a small amount of error and experimentation as errors propagate in two directions. This could be shown using a Markov model.

Pure coordinating behaviour creates a Lyapunov function so this process converges quickly. If we include consistency and error then we no longer have a Lyapunov function , but we could use a Markov model. The Lyapunov model and the Markov model can be used to explain other models like the culture model. Even though culture is complex, simple models allow us to understand some basic properties of cultures, such as difference between cultures, consistency within cultures, and heterogeneity within cultures.

13. Path Dependence

13.1. Path dependence

Path dependence means that what happens now depends on what happened along the path to get here, so history matters. This is different from Markov process models where history doesn't matter. Urn models can help to understand path dependence, and to distinguish between different types of path dependence. For example, an outcome such as what happens today could be path dependent, but also the equilibrium distribution over all possible outcomes of what happens in the long run could also be path dependent.

One of the most famous examples of path dependence involves the standard typewriter keyboard configuration called QWERTY. Initially, there were lots of different keyboard configurations, but it turned out that in the path through which history played out, QWERTY ended up getting locked in due to a process of increasing returns. The more people had QWERTY, the more people wanted it, and the more typewriters that were built with it, and so it became locked in.

Path dependence means that outcome probabilities depend on the sequence of previous outcomes, which is called the path. Phat dependent means that outcome probabilities depend on past outcomes but not on their order. The past impacts the outcome but it doesn't necessarily determine it. This happens, for example, with choices over technology such as keyboards, alternating current or direct current, or gasoline cars versus electric cars. Another example is the law. How the law evolves over time depends on what laws there have been in the past.

It also the case with institutional choices like having a single payer healthcare system or defined benefits pensions versus defined contributions. These choices can depend on previous institutional choices. Economic success can also depend on sequences of past outcomes. For example, Ann Arbor had the largest public university while Jackson had the largest prison. Population in Ann Arbor increased from 10,000 in 1910 to 110,000 in 2000 while in Jackson it first increased and then declined and grew only from 31,000 in 1910 to 36,000 in 2000. The public university greatly helped Ann Arbor.

Path dependence often coincide with increasing returns. If you build a university then other educational institutions like hospitals, and law schools that weren't originally part of the university, could join in. Eventually, you grow through a virtuous cycle. Increasing returns means building success on top of success. Path dependence isn't the same as increasing returns.

Chaos is often referred to as ESTIC, which stands for extreme sensitivity to initial conditions. Chaos means that, even when the starting points A and B are very close to each other, subsequent paths can diverge tremendously. Chaos deals with initial points while path dependence deals with the path. Path dependence in a dynamic process that has a state in each period differs from a Markov process. These path dependent processes violate the fixed transition probabilities assumption of the Markov process. In urn models transition probabilities change, and that is why history can matter.

Urn model
Urn model

13.2. Urn models

Urn models can be used to distinguish between processes that are path-dependent and processes where the order doesn't matter. They can also be used to distinguish between processes that have outcomes in a given period that are path-dependent, and processes that have long-run equilibria of distributions over the set of outcomes that are path-dependent.

An urn model is a container with blue and red balls. You pick balls out of the urn. The ball you pick is the outcome. There is a probability of picking out balls of different colours. The simplest urn model is the Bernoulli model where Outcomes are independent of previous outcomes. The number of balls in the urn is fixed, for example three red balls and one blue ball. After picking a ball, you return it into the urn. The probability of picking up a red ball is constant and always 3/4.

The most famous path-dependent process is called the Polya process. It works starts with an urn with two balls, one of them red and one of them blue. You pick a ball, and after you pick that ball, you put back the ball and add another ball of the same colour you just picked. This can be repeated over and over again. Over time, probabilities change as more and more balls are added. This process is path-dependent as the probabilities depend on the path of previous choices.

The Polya process might apply to fashion. If more people are wearing a red shirt, this increases the probability that more people will be wearing red shirts. The interesting results of the Polya process are:
- (1) any probability of red balls is an equilibrium and equally likely, for example 3% is as likely as 88%;
- (2) any sequence of R red balls and B blue balls is equally likely, for example RBBRB is as likely as RRBBB.

The opposite of the Polya process is the balancing process where you return the ball but also add a new ball of the opposite colour as the ball you just picked. The balancing process converges to equal percentages of the two colours of balls. This might apply to situations where you want to keep different constituencies happy, for example a party convention in the United States might be put it in a northern state or a southern state. If it is put a northern state, this increases the probability that it is going to be put it in a southern state four years later.

In the Polya Process, the equilibrium is incredibly path-dependent, but in the balancing process it is not. There is also a distinction between the period outcomes, which is what happens in a particular period, for example red or blue, versus what the distribution of balls in the urn looks like in the long run. Path-dependent outcomes means that what happens in a period depends on what happened in the past. That is true for both the Polya process and the balancing process.

Path-dependent equilibrium means that what happens in the long run depends on the process along the way. The Polya process has both path-dependent equilibria and path-dependent outcomes. The balancing process only has path-dependent outcomes. The equilibrium is always 1/2 and it doesn't depend on what happened along the way. In the balancing process, history matters at each step in time. However, that doesn't mean that what happens in the long run depends on the path.

Two examples from history can illustrate this. The United States had the idea of manifest destiny, meaning that the United States was likely to be a country that stretched from sea to sea. History played out in particular ways, but some people argue that it didn't matter as the United States was destined to be a country that stretched from sea to sea. Another example is the railroads. Some people argue that once the railroads were invented, building them became inevitable. The lay of each track was path-dependent, but the long-run outcome may not have been as the tracks were laid where it was economically efficient.

Path-dependence means that the outcome probabilities depend on the sequence of past outcomes. Phat-dependence means that outcome probabilities depend on past outcomes but not their order. The Polya process is phat-dependent. All that matters is the set of outcomes, not the order in which they appeared.

The sway process
The sway process
Why is the difference between path-dependent versus phat-dependent so important? If there are 30 picks then there are 230 or over a billion different paths. The number of possible sets is only 31. The number of blue balls could have been 0, 1, ..., 30. If it is set dependent, there are only 31 different possibilities. If it is path dependent, that means there are more than a billion possibilities.

The sway process is path-dependent. It starts with one blue and one red ball. When a ball is picked, a ball of that same colour is added, just like the Polya process, also 2t-s - 2t-s-1 balls of the colour of the ball chosen in each period s less than t.

For example, in period one, you pick a blue ball and you add a blue ball. In period two, you pick a red ball, so you add a red ball, and also add a blue ball for the blue ball you picked in period one. In period three, you pick a blue ball and add a blue ball, but also a red ball for the red ball you picked in period two. And now you add two blue balls, so you multiply times two the blue ball that you picked in period one. In period four, you get a red ball, so now you add a red ball, and a blue ball for period three, and two red balls for period two, a four, or two times two, blue balls for period one.

This process is called the sway because, as you go back in time, decisions take on more and more weight. In this way the path is taking on more and more influence. So, when people talk about path dependence, for example in law, institutional choices, or technological adoptions, they think about early movers having a bigger effect and the past having an increasing weight. One way you can get full path dependence, is by having that sort of process.

13.3. Mathematics on urn models

For the Polya process, each equilibrium is equally likely, and any history of R red balls and B blue balls is equally likely. This can be proven. The Polya process starts with one red ball and one blue ball. Then a ball is picked. Then the ball is returned an another ball of the same colour is added. This process is repeated. Let's first prove that any history of B blue balls and R red balls is equally likely because this proof can be used to prove that each equilibrium is equally likely.

Proving that any history of R red balls and B blue balls is equally likely, is easy. For example, P(RBBB) = (1/2)(1/3)(2/4)(3/5) = (1*1*2*3)/(2*3*4*5) = 1/20 and P(BBBR) = (1/2)(2/3)(3/4)(1/5) = (1*2*3*1)/(2*3*4*5) = 1/20. Also P(RBRB) = (1/2)(1/3)(2/4)(2/5) = (1*1*2*2)/(2*3*4*5) = 1/30 and P(BBRR) = (1/2)(2/3)(1/4)(2/5) = (1*2*1*2)/(2*3*4*5) = 1/30.

You first make a pick out of 2 balls. Then you make a pick out of 3 balls. If you make N picks, the denominator will be N!. If you make N picks and end up with B blue balls, you have to pick N-B red balls. For blue balls, you pick first out of 1, then out of 2. This continues until you pick out of B. For red balls, you pick first out of 1, then out of 2. This continues until you pick out of N-B. The numerator will then be B!*(N-B)!. So, for any N picks with B blue balls, the probability is (B!*(N-B)!)/N!.

It is more difficult to prove that any probability of red balls is an equilibrium that is equally likely to happen. For example, P(BBBB) = (1/2)(2/3)(3/4)(4/5) = (1*2*3*4)/(2*3*4*5) = 1/5 and P(BBBR) = (1/2)(2/3)(3/4)(1/5) = (1*2*3*1)/(2*3*4*5) = 1/20. Now, there are four ways in which one red ball could emerge, which are RBBB, BRBB, BBRB and BBBR. They are equally likely to happen, so P(RBBB) + P(BRBB) + P(BBRB) + P(BBBR) = 1/20 + 1/20 + 1/20 + 1/20 = 1/5.

Now P(50B) = P(50R) = (1/2)(2/3)(3/4)...(49/50)(50/51) = 1/51. Now P(49B 1R) is a bit more complicated. The probability of picking 49 blue balls first and 1 red ball last is (1/50)*(1/51). There are 50 ways in which that one red ball could emerge, so P(49B 1R) = P(1B 49R) = 50*(1/50*51) = 1/51.

P(47B 3R) is even more complicated. Assume first that the three red balls are last. You first pick a blue ball with probability 1/2. Then a blue ball with probability 1/2. This goes on until you pick a blue ball with probability 47/48. Then you pick a red ball with probability 1/49, then another red ball with probability 1/50, and another red ball with probability 1/51. So you get a probability of (1/2)(2/3)...(47/48)(1/49)(2/50)(3/51) = (47!)(3!)/(51!) = (1*2*3)/(48*49*50*51).

There are 3 red balls, so there are 50 places where the first one could have been, then 49 places for the second one, and 48 places for the third one. But picking red balls on places 12, 29, and 38 is the same as 29, 12 and 38 or 38, 12 and 29, so we have to take into account the different orders. Three balls can be in 3*2*1 = 3! = 6 different orders. Now the probability is ((1*2*3)/(48*49*50*51))*(50*49*48)/(3*2*1)) = 1/51. In this way it is possible to arrive at the conclusion that any probability of red balls is an equilibrium that is equally likely.

13.4. Path dependence and chaos

Path dependence means that the outcome probabilities depend on the sequences of past outcomes. There are path dependent outcomes and path dependent long run equilibria. Markov processes have a unique stochastic equilibrium that is not path-dependent. Markov processes are not path dependent because of the fixed transition probabilities. In the urn model where balls are added based on the colour of the ball picked, the transition probabilities change.

Tent map function
Tent map function
Path dependence can also be related to chaos. Chaos is often referred to as ESTIC, which stands for extreme sensitivity to initial conditions. Chaos means that, even when the starting points A and B are very close to each other, then after many iterations of the outcome function they differ by arbitrary amounts.
Tent map example
Tent map example

Recursive functions can be used to describe processes that are chaotic. In a recursive function, the outcome at time t depends on the outcomes of previous periods, so xt = f(x1, x2, ..., xt-2, xt-1), for example f(xt) = xt-1 + 2, which would give 1, 3, 5, 9, 11.

An example is the tent map. For x in (0,1) meaning that 0 < x < 1, if x < 1/2 then f(x) = 2x and if x ≥ 1/2 then f(x) = 2 - 2x. Starting with 0.21 it would give 0.42, 0.84, 0.32, 0.64. An example of the tent map, which starts with two points that are very similar, one being 0.4321, and the other being 0.4322, shows that after a few periods, the two points are a long way away from each other.

The tent map ends up being chaotic because there is extreme sensitivity to the initial condition. This is not path dependence. Once the initial point is chosen, then it is possible to calculate exactly what's going to happen. So chaos, in its standard form means extreme sensitivity to initial conditions.

A process is independent if outcomes don't depend on the past history of outcomes. This independence is a probabilistic concept, for example a 50% chance of getting a red or blue ball each period. A process depends on the initial conditions if the outcomes depend only on the initial state so that it is completely deterministic. Path dependence means that the outcome probabilities of what happens in the long run, depends on what happens along the way. Phat dependence means that outcome probabilities don't depend on the order in which things happen but only depend on the set of things.

When historians or institutional scholars think about path dependence, they often think in terms of the sequence of events mattering, not just the set of things mattering. They also think that events aren't independent, and they think that although initial conditions matter, other things matter too.

If things happened independently then there is no structure in history and that doesn't make any sense. Extreme sensitivity to initial conditions in a deterministic process doesn't make sense either. That means that fate is completely predetermined by initial choices. It is also not phat, because then only the set of previous events matters, and not the order. Imagine America buying Louisiana from France before becoming independent. That doesn't make sense.

13.5. Path dependence and increasing returns

Increasing returns means that, the more there is of something, or the more people that do something, the more people want of it, or the more other people are going to do it. For example, the more people to get QWERTY typewriters, and the more other people want these typewriters. There are two reasons for this. First, people can then use somebody else's machine more easily. Second, from a production standpoint, it makes more sense to standardise. So there are positive feedbacks. More produces more.

This is like the Polya process. The more blue balls are picked, the more likely blue balls are picked in the future. So, many people think that increasing returns cause path dependence. Both the sway and the Polya process have increasing returns. The balancing process, which doesn't produce path dependence, didn't have increasing returns. In fact, it had decreasing returns. The more red balls were picked, the more blue balls were picked in the urn.

Is increasing returns the same thing as path dependence? You can get increasing returns without path dependence. The gas/electric model represents the choice between gas cars and electric cars. Both had a positive feedback, for example more cars of a specific type meant more production and more filling stations of a specific type. However, when automobiles were first developed, gas cars had much larger increasing returns than electric cars because of the storage problems for electricity. Furthermore, you can add a gas engine to an electric car, so adding an electric engine also makes another gas engine more likely.

This can be represented by the following model. Assume that you start with five blue balls and one red ball. If you pick a red ball, you add one red ball and one blue ball. If you pick a blue ball, you add ten blue balls. The chance of picking a blue ball first is 1/5. If that happens, the chance of picking another blue ball is 2/7. These are increasing returns. Increasing returns to blue balls are much larger. If you run this process, the blue balls win out every single time. This process isn't path dependent, but it has increasing returns, hence increasing returns are not the same as path-dependence.

You can also get path dependence without increasing returns, for example with symbiotes. You start with one red ball, one blue ball, one green ball, and one yellow ball in the urn. If a red ball is picked, you add a green one. If a green ball is picked, you add a red one. If a blue ball is picked, you add a yellow one. If a yellow ball is picked, you add a blue one. There are no increasing returns, because if a red ball is added, this increases the odds of green. If you order the balls into two sets, which are {red, green} and {blue, yellow} then the process is path-dependent but doesn't have increasing returns.

Increasing returns is logically distinct from path dependence but often in history path-dependence was caused by increasing returns. The urn models can be used to clarify that. Increasing returns is not the only way you get path dependence. To understand that, you have to move beyond urn models and look at externalities or the interdependencies between decisions. For example, public projects like building a nuclear power plant, or creating a national park, are huge economic decisions that affect a lot of other things, so they create externalities.

Projects and externalities
Projects and externalities
This can be illustrated using model. Assume there are five projects A, B, C, D, and E. Each one has a value of 10 of its own but they also create some externalities. The matrix represents the size of the externalities between projects, for example the externalities between project A and project B are -10. If you start with A, the returns are 10. If you then plan to to B, then AB returns 10 + 10 + 20 = 0, so it doesn't make sense to to B after A.

Now AC returns 10 + 10 + 5 = 25 so it makes sense to do C after A. Now doing ACD returns 10 + 10 + 5 + 10 - 10 + 0 = 25, so doing D after AC makes no difference. If you start with B first, then it makes no sense to to A. After doing B, there is no benefit from doing C so C might not happen. D will be done because of the externality of 30 with B.

The projects that get done don't only depend on the initial conditions, in this case the project picked first, but also on the externalities of the projects that have been done in the past. If most of the externalities are positive, there is less path-dependence than when the externalities are negative, because positive externalities create increasing returns and make other projects more likely. A big cause of path-dependence is externalities.

13.6. Path dependent or tipping point

Path dependence and tipping points seem closely related concepts. Path dependence means that outcome probabilities depend on the sequence of past outcomes. There are path dependent outcomes, meaning that what happens in a given period depends on the path, and path dependent equilibria, meaning that what happens in the long run depends on what happens along the way. Tipping points are related to path-dependent equilibria.

Active tip
Active tip
There are two types of tipping points. With a direct tip or active tip, a variable itself changes, which causes it to tip. With a contextual tip something changes in the environment that makes it possible for the system to move from one state to another. Path dependence can be related to direct tips.

The difference between path-dependence and tipping points is that path-dependence means that what happened along the way has an effect. Each step may have a small effect. A tipping point is a single instance in time where, where the equilibrium suddenly changes drastically. A singular event suddenly tips the system abruptly.

To measure tips, you can use measures of uncertainty like the diversity index, which shows the probability of different equilibria, and entropy, which measures how much information there is in the system. For the Polya process any probability distribution of red balls is an equilibrium and equally likely. When drawing four balls, five things could happen, which are drawing zero, one, two, three or four red balls. Each is equally likely, so the probability of each option is 1/5 so the diversity index is 5.

Suppose that the first ball is red. Then the following could happen. The probability of having four red balls is (2/3)(3/4)(4/5) = 2/5. The probability of having three red balls and one blue ball is (2/3)(3/4)(1/5)*3 = (1/10)*3 = 3/10 as the blue ball could be in three locations. The probability of having two red balls and two blue balls is (2/3)(1/4)(2/5)*3 = (1/15)*3 = 1/5 as the additional red ball could be in three locations. The probability of having four blue balls is (1/3)(2/4)(3/5) = 1/10. Now, P(4R) = 4/10, P(4R) = 4/10, P(3R) = 3/10, and P(1R) = 1/10, so the diversity index is 1/((4/10)² + (3/10)² + (2/10)² + (1/10)²) = 30/100 ≈ 3.33.

This movement of the diversity index suggests that something happened along the path affecting the outcome probabilities. There is path dependence, but it is not abruptly tipped. Abruptly tipped would be moving from 5 to 1 or 1.2 so that one single event would get rid of a lot of uncertainty. The difference between path dependence and tipping points is one of degree. Path dependence is more gradual. Tipping points are more abrupt.

School friendship network
School friendship network

14. Networks

14.1. Networks

Networks have become a popular topic. The internet made it possible to make all sorts of network connections with people. There is also more and more data on networks. Networks are important for scientific innovation, the spread of ideas, or the decrease in smoking.

Adult friendship network
Adult friendship network
A graph of a middle school friendship network shows Caucasian students as white dots, African American students as green dots are, and students of mixed race as red dots. This graph shows that middle school is segregated by race and gender as there are friendship clusters of black females, white males, black males, and white females. Adult friendship networks have little clusters of closely connected people but there is no segregation by race and gender.

Polarisation in US society
Polarisation in US society
By looking at email networks within a corporation, it is possible to find people who tend to email a lot of other people. In this way it is possible to see how information flows within an organisation and who is important to the organisation.

The polarisation in American society can also be illustrated using a network. Liberal blogs are painted blue, conservative blogs are red blog, links between liberal blogs are blue lines, links between conservative blogs are red lines, liberal blogs linking to a conservative blog are yellow lines, and a conservative blogs linking to a liberal blog are purple lines. The graph shows that the political discourse is very polarised.

The following issues with regard to networks are of interest:
- (1) the logic of networks is about how did networks come to be and what rules do people or organisations use to form connections to other people and organisations?
- (2) the structure of networks refers to the properties and measures to compare one network to another, such as how connected networks are how many nodes there are, how many edges are there?
- (3) the functionality of the networks is what the network does, for example, in social networks there is the issue of six degrees of separation, which means that you can get from any one person to any other person through six connections.

The interesting thing about functionality is that no one set out to form a network to provide this functionality. The functionality just emerges through the process. There is a logic to which the network forms, that creates a structure, and then that structure has emergent functionalities such as the six degrees property.

14.2. The structure of networks

The structure of a network can be defined in terms of nodes, which are the items that are connected represented by points, and edges, which are connections represented by lines. Edges can be either directed, and represented by an arrow, or undirected, and represented by a straight line. An example of an undirected network is bordering states. For example, if Germany borders to Belgium then Belgium also borders to Germany. A directed network might be students looking to other students for fashion advice. Person A might look to person B but person B might not look to person A. A lot of social networks are directed.

The structure can be represented by the measures:
- (1) degree, which is how many edges each node has on average;
- (2) path length, which is how far it is from each node to another node;
- (3) connectedness, which is whether the entire graph is connected to itself;
- (4) clustering coefficient, which is a measure of how tightly clustered the edges are.

Most people's friends are more popular
Most people's friends are more popular
The degree is the number of edges are connected to that node. For example, in a social network it might be how many friends a person has. The degree of a network is the average degree across all nodes, or average degree = 2 * (edges/nodes), so for a network with 10 nodes an 9 edges the average degree = 1.8.

The neighbours of a node are all other nodes connected by an edge to the node. The average degree of neighbours of nodes will be at least as large as the average degree of the network. Typically, it is larger. What does this mean? In a social network this means that most people's friends are more popular than they are.

Connected network
Connected network
Disconnected network
Disconnected network
In the example in the graph, A has 2 friends while the average degree of his friends is 2.5, B has 3 friends while the average degree of her friends is 1.66, C has 2 friends while the average degree of his friends is 2.5, and D has 1 friend while the average degree of her friends is 3. In this case people on average have 2 friends while their friends on average have 2.4 friends.

The path length from A to B is the minimum number of edges that must be traversed to go from node A to node B. For the graph, the path lengths are AB = 1, AC = 1, AD = 2, BC = 1, BD = 1, CD = 2. There are 6 paths with a total length of 8. That means the average path length is 8/6 ≈ 1.33.

Connectedness means that it is possible to get from any node to any other node in the network. There are also disconnected networks where it is not possible to get from any node to any other node in the network. It is als possible to represent a Markov process like a network by drawing arrows showing the probability of movement from one state to another.

Possible edges
Possible edges
The clustering coefficient is the percentage of triples of nodes that have edges between all three nodes. In a network of four, there are four of such triples. So, if there is one triangle, then the clustering coefficient is 1/4. The maximum clustering coefficient is 1. It is possible to draw two different connected graphs of the same number of nodes and the same number of edges that have different cluster coefficients.

Degree tells something the density of connections. This could be a measure for social capital, for example the number of connections between people in a community. It could also say something about speed of diffusion of information. One thing that will determine that is how connected people are.

Path length can be useful with airlines as it can tell something about the number of flights needed on average. For example, if the average path length from Miami to LA is 1.1, there probably is a direct flight. It can also tell something about social distance. If the average path link between employees is six, it probably takes five or six other employees to get from employee A to employee B. That means that information and ideas do not spread very fast within this organisation.

Connectedness matters for Markov processes because it must be possible to get from any state to any other state to apply the Markov convergence theorem. The abilities of a terrorist group may depend their network. If it is not connected then it may be difficult for them to pass information needed to carry out these activities. It is also relevant for internet or power failures. If the electric becomes disconnected, people don't have any power. Finally, people that are disconnected from other people are be isolated in terms of information so they may not learn things.

The clustering coefficient can be used to figure out how redundant or robust a network is. So if one person leaves out of a triangle, the other persons are still connected. It is also sometimes used as a measure of social capital. The clustering coefficient can also be used to capture how likely an innovation is to be adopted, for example a new word.

So, if three people, A, B and C, are all connected, and A uses some new word so that B hears it and tells it to C so that C uses the same new word when talking to A, the use of the word may be reinforced. As information is passed through the loop, it seems to be part of the new normal. If A wasn't connected to C, then if A said something new, it could just fan out and never come back to A so that A would be less likely to maintain the innovation. Clusters creating feedbacks can allow things to be maintained within a network.

With regard to networks, a picture is often worth a thousand words. The statistics of these networks, such as degree, path length, may be very similar but the graphs may look very different because they have different clustering coefficients. To distinguishing between graphs, quite a few statistics are needed. An interesting book is Networks An Introduction written by Mark Newman [12].

14.3. The logic of network formation

Nodes can be seen as agents that make decisions about what other nodes to connect to that result in the structure of the network. There are three simple ways in which networks might form:
- (1) random connections, where each node randomly decides to connect to other nodes.
- (2) the small worlds model where each person has some friends that belong to a clique and some other friends that are random. This is what a lot of social networks look like.
- (3) the preferential attachment network, which means that you more likely to connect to nodes that are more connected. This applies to the internet.

Netlogo small worlds
Netlogo small worlds
With random attachment there are N nodes and there is a probability P that two nodes are connected. Random networks have a contextual tipping point. For large values of N, there is a contextual tipping point, and the network almost always becomes connected if P > 1/(N-1), otherwise the network most likely is disconnected.

Real social networks are different. This led to the small worlds network idea where a percentage of friends are local and some other friends who are more random. Local or clique friends could the people at work or the people that live nearby. Random friends are the people from summer camp with, or an old college roommate.

Preferred attachment
Preferred attachment
This can be simulated with Netlogo. In this model 40 people are put in a circle. Each person is connected to two people on either side, which creates clusters. In the simulation people are rewired randomly one at a time so tha they add random friends. During the process the average path length is falling as well as the clustering coefficient.

With preferential attachment, nodes arrive on the scene, and then they have to decide, who do I connect to? It could be about someone moving into a new city or creating a webpage. The webpage can be seen as a node, and the owner has to decide to what other webpages to connect to. Assume that the probability to connect to an existing node is proportional to the degree or the number of connections it already has.

Long tail distribution
Long tail distribution
This can be simulated using Netlogo. The size of the nodes represents the number of connections. There are a few nodes that are far more connected than most others. The distribution of degree for the nodes is skewed. Most have a low degree and a few have a high degree. Different runs of the simulation produce different network but with very similar degree distributions.

The exact network that arises is path dependent. The equilibrium distribution of degrees is not. The degree distribution is long tail, meaning that a lot of nodes that have a degree of one, and a small number that have a large degree.

14.4. Network function

The functions of networks are often emerging. When people form a network they are not thinking about the entire network but about their own connections. The properties of the network structure emerge from the logical process through which it forms. That structure has functionality so it can do particular things.

The six degrees phenomenon comes from two famous experiments in social science. In one experiment in the 1960s, 296 people from Nebraska were asked to get a letter to a stockbroker in Boston by only sending the letter to someone they knew on a first name basis. On average it took six steps for the letters to get there. Almost 40 years later, this experiment was redone with 48,000 people on the internet who had to send an email to someone they knew on a first name basis and get it specific people all around the globe. Again, the average number of steps was six. So it typically takes six steps to get from one person to another.

Small world's network
Small world's network
It is possible to understand how that can be by looking at a variant of the small world's network. People form friendship networks. Assume that each person has C local or clique friends and R random friends. In the graph people within cliques are friends so all of the other members so these cliques have a very high clustering coefficient. Each person in the clique also has one random friend that belongs to some other clique.

A K-neighbour are all nodes that are of a path length K to a node but not of any shorter path length. For example, a two neighbour is a node that connected to a node the node is connected to. This can be used to show how there could be six degrees of separation.

A person has C clique friends as well as R random friends. There are the 1-neighbours. The 2-neighbours are shown in the graph. These are the clique friends' random friends CR, the random friends' random friends RR, and the random friends' clique friends RC. The clique friends' clique friends are just the clique friends. So:
- 1-neighbours = C + R;
- 2-neighbours = CR + RR + RC as CC = C;
- 3-neighbours = RRR + RRC + RCR + CRR + CRC as RCC = RC, CCR = CR and CCC = C.

On average people have 140 clique friends and 10 random friends so the number of 1-neighbours is 150. The number of 2-neighbours are the clique friends' random friends (140 * 10 = 1,400), plus the random friends' clique friends (10 * 140 = 1,400) and the random friends' random friends (10 * 10 = 100), which gives me 2900. The number of three neighbours is RRR + RRC + RCR + CRR + CRC = 1,000 + 14,000 + 14,000 + 14,000 + 196,000 = 229,000 people.

In 1973 Mark Granovetter wrote a paper called The Strength of Weak Ties in which he found that the important things that happen in your life, like the job you get, who you marry, where you live, doesn't depend on your 1-neighbours, but from your 2-neighbours and 3-neighbours. A 3-neighbour could be your roommate's (1) brother's (2) friend (3) or your mother's (1) co-worker's (2) daughter (3) or your high school roommate's (1) college roommate's (2) dad (3).

Scientist collaboration network
Scientist collaboration network
People on average have 150 1-neighbours, 2,900 2-neighbours, and 229,000 3-neighbours so there are so many more 3-neighbours that it is much more likely that one of them gets you the job or introduces you to the person you're going to marry. This is the strength of weak ties. It becomes clear once you write down a model and do the math.

There are other network structures, such as the network of collaboration among scientists, where some people collaborate more with others, and the world wide web model, where some nodes were connected to a lot and most nodes were only connected to few. What are the functionalities of this sort of network?

The internet is robust to random node failures because most nodes have few connections. This fact that it emerges from the structure of the network. However, the internet is not robust to targeted failures, because taking down a few highly connected nodes can shut down internet. This functionality emerges from the preferential rule. Nobody planned it. This just happened.

If you understand the functionality of networks then you can think about interventions. Suppose there is a disease spreading. You have to vaccinate as a function of R0. The higher R0 is, the more people you have to vaccinate, assuming that people were randomly connected. In real social networks some people are far more connected than others, for example school teachers. Instead of vaccinating 30% of the people based on R0, you could target connected people in the social network.

By combining the network model with the disease model, it is possible to lower vaccination rates and still stop the spread of diseases. This is the advantage of being many model thinker because there are many models that can be combined in interesting ways.

15. Randomness and Random Walks

15.1. Randomness and random walk models

Random variables can be used to show how sometimes outcomes in the real world depend on skill and luck. Random walks are processes where the cumulative value of something depends on a sequence of random variables. Random walk models start at a point and then take random steps.

In the binary random walk model each step is either +1 or -1. In the normal random walk model each step is of random size based on a normal distribution or the bell curve of the central limit theorem. A normal random walk can be used to discuss the efficient market hypothesis which states that, in the stock market price movements, after correcting for the growth of the economy, are random.

In the finite memory random walk, the value only depends on the last set of random variables. So instead of having a walk that just goes forever, the value depends on random variables in the last few periods.

15.2. Sources of randomness

Randomness may mean that all possible outcomes are equally likely and uniformly distributed, but in most cases the outcomes are normally distributed so that there is a bell curve. A long tail is also possible, for example in network connectedness. There are a lot of different forms of randomness. When randomness is introduced in models, there is some value X, but also some error term ε that combine to X + ε.

Models in for instance, economics, sociology, physics, biology, and engineering, all use the error term ε for the following reasons:
- (1) noise and measurement deviations, for example a telescope might be affected by different levels of ambient light and humidity;
- (2) error, for example the weight of ice cream scoops can differ;
- (3) uncertainty, for example the cost of a building project is estimated but the real cost may differ because not everything can be known in advance;
- (4) complexity, because if the world is complex, it is difficult to guess what is going to happen;
- (5) capriciousness, because people are hard to predict.

15.3. Skill and luck

Someone's outcome depends on skill and luck or randomness. This applies to any sort of domains, such as sports, law, and politics. In his book, The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing, Michael Mauboussin took a lot of different domains and tried to figure out how much in that domain performance is determined by skill and how much is luck [13]. In a formal model, it is outcome = a * luck + (1-a) * skill where a in (0,1).

To figure out how big a is, you can look at the sequence of outcomes. If this same team continues to perform well over time, then a must be fairly small. But if there are huge jumps from period to period then a probably is big. Suppose someone has a = 0.5 then outcome = 0.5 * luck + 0.5 because luck is a random variable while skill isn't. The outcome could vary between 0 and 1. Alternatively if a = 0.1 then outcome = 0.1 * luck + 0.9 so that there is a much tighter distribution.

The influence of skills and luck on outcomes is important for the following reasons:
- (1) assess outcomes as the product of skill or luck;
- (2) there will be a reversion to the mean, if outcomes depend a lot on luck;
- (3) give good feedback, for example a manager can praise a person if a good outcome is the result of skill, but if it is the outcome of luck, the manager could say that she will not be disappointed when results are less good;
- (4) fair allocation of resources, for example if skill is the most important, employees that have better results could have better salaries, but if luck is the most important, employees could be rewarded more equally.
AL batting leaders
AL batting leaders

The paradox of skill means that when the very best people compete against each other, they tend to have fairly similar skills, so the winner is likely to be determined to a significant amount by luck, even in an environment where luck does play a small role. If there is little variation in skill, small differences in luck can be decisive.

For example, Miguel Cabrera from Detroit won the American League batting title as he had hits in 34.4% of his bats, and behind him were two people at 33.8%. A few hits made a difference here. Skill brought him in the top 4 but it was luck that won him the batting title.

Binary random walk
Binary random walk

15.4. Random walks

The binary random walk model works as follows. Assume that each period a coin is flipped. It could either come up heads or tails. The model has a variable X that represents winnings and starts with 0. If the coin comes up heads, X is increased by 1. If it comes up tails, X is decreased by 1. The variable X keeps track of the winnings.

There are three mathematical results about these binary random walks that are surprising:
- (1) after an even number of N flips, the outcome is expected to be 0;
- (2) for any number K, a random walk that goes on forever, passes +K and -K an infinite number of times, meaning that it is not going to take off into a direction;
- (3) for any number K, a random walk that goes on forever, will have a streak of K heads in a row or K tails in a row an infinite number of times.

For example, the odds of winning 16 times in a row is 1/(216), which is 1/65,536. That may not seem likely, but if 65,000 people play the game 16 times, it is expected that one of them wins 16 times in a row.

There is regression to the mean which means that a group that did well for a short time, should do average in the long run. There is also regression to the mean which means that a group that did well for a short time, should do average in the long run. For example, there is a famous study called the hot hand study. In 1980 the Boston Celtics made 75% of the free throws. If there really were hot hands, then after the miss the first free throw, a player is less likely to make second free throw. It turned out that the second free throw was made 75% of the time, regardless of the result of the first free throw. There was no evidence of a hot hand.

In 2001 Jim Collins wrote a book called Good to Great. He looked at companies that have been successful, and figured out the characteristics of those companies, such as humble leaders and confronting the facts. It was the best-selling business book of all time. The companies that were listed as great in 2001 did not so well in the next decade. This can be related to the no free lunch theorem that states that no algorithm is better than any other. Any heuristic, like humble leaders and confronting the facts, may work in one setting but may not work in another setting [14].

Related to the notion of streaks is clusters. If you just randomly throw things down on a graph, there will be clusters. Suppose there is a 1,000 by 1,000 checkerboard, and you fill in each square at probably of 1/10, so that 100,000 squares out of a million are filled. In that case there will be 996 times 996 clusters of size 5 times 5. There will also be 990 times 1,000 rows of 10. Clusters can be a random phenomenon.

Normal random walks
Normal random walks

15.5. Random walks and Wall Street

In the normal random walk model each step is of random size based on a normal distribution. A normal random walk can be used to discuss the efficient market hypothesis which states that, in the stock market price movements, after correcting for the growth of the economy, are random. The graph shows a few normal random walks. Since the mean is a normal distribution of zero, they average out around zero.
Dow movements normal distribution
Dow versus normal distribution

The stock market random walk however doesn't have a normal distribution. With regard to the returns on the DOW Jones industrial average, there are relatively many days where nothing happens, and many days with big movements. Still, it is not a bad first approximation for a normal random walk. But we could fix it to approximate to normal exact excursion if we want it.

In the famous book A Random Walk Down Wall Street, Burton Malkiel argues that prices in the stock market are random walks. Many people think that there are trends. That idea can be evaluated using data. Suppose the DOW posted a gain the previous day, then if there really are trends, there is likely to be a gain the next day [15].
DOW trend
DOW trend

If the market is a random walk then there should be no trend. After 1975 it is true, but prior to 1975, it wasn't. So, the market may have become more efficient and that there no advantage there. This makes sense. Suppose that if prices went up today, they would go up tomorrow, so you should buy even more today, but that's going to drive prices up today so that they will not go up tomorrow and probably will be random [15].

The efficient market hypothesis states that prices reflect all relevant information, therefore any fluctuations should just be random, so that it is impossible to beat the market. The logic of the random walk idea is that the flow of information is unimpeded and information is immediately available in stock prices. News is by definition unpredictable so resulting price changes must be unpredictable and random.

Is that true? People noticed the January effect. Between 1927 to 2001, stock prices went up 4% in January, while in the other months they went up less, on average about 1%. In September and October they even fell. What does this mean? It means that it's good to invest in late December. If everybody knows this, then prices will go up in December, which means that prices in January will go down. In recent years this often happened, for instance in 2009, 2010 and 2011.

There are criticisms of the efficient market hypothesis. First, there is too much fluctuation, which makes it hard to believe that the prices are efficient. There must be other things going on. There are also consistent winners. If the market was efficient, then nobody could win 30 years in a row, and there wouldn't be people who systematically outperformed the market in the way like it happens in reality. An example is Warren Buffet.

15.6. Finite memory random walks

In the random walk model the value depended on every previous movement. In the finite memory random walk, the value on time t only depends on the previous N movements, for example the previous five movements, so Vt = Xt + Xt-1 + Xt-2 + Xt-3 + Xt-4. For example at time 10, you have V10 = X10 + X9 + X8 + X7 + X6.

These X values could be employees. A firm could hire one new employee each period and let one employee go. The movement could reflect how good the new employee is. Alternatively these values could be products and the movements could reflect sales.

Sports championchips
Sports championchips
Sports is a good place for this model. A team consists of a set of players. Every year the teams drafts new players and let people retire. The model consists of 28 teams of five players. The champion is whichever team has the highest value at time t. Now Vt = Xt + Xt-1 + Xt-2 + Xt-3 + Xt-4.

This model can be used to predict other things as well as the number of distinct champions in the course of 28 years. The first three bars represent runs of the model that produced 16, 13 and 16 different champions. The other bars represent real sports leagues. The results are remarkably close. For the most championships won, it is also fairly hard to distinguish which is the model and which is the real world.

With regard to the longest streaks, the model always produced 5, but in reality this number didn't get above 3. That is not surprising because for sports teams it probably is really hard to win five times in a row because other teams really become motivated to beat the winner.

16. Colonel Blotto

16.1. Colonel Blotto game

Two games brought game theory into the mainstream from policy setting. These are the prisoner's dilemma and Colonel Blotto. The Colonel Blotto game involves strategic mismatch. It is about the positioning of troops versus those of the enemy on different fronts. The idea is to have your strengths go against the weaknesses of your opponent. This is a fertile model model that can be applied in a lot of different settings.

16.2. Blotto: no best strategy

The Colonel Blotto game is about strategic mismatch. Both players have T troops at their disposal. They decide where to allocate them across N fronts. The number of troops is far greater than the number of fronts, so T >> N. Whoever has the most troops on a front wins on that front. Whoever wins the most fronts wins the game.

The Colonel Blotto game is a zero sum game. There is some sort of equilibrium where one player wins and the other loses. This makes the game very competitive, which means the player who has lost will try something different in order not to lose the next time. Successful business people try to stay out of zero sum games because zero sum games are straightforward. You have to work very hard just to break even. It is better to be in a positive sum where everybody can win.
  front 1 front 2 front 3
player 1 34 33 33
player 2 40 40 20
100 troops on 3 fronts T = 100, N = 3

Suppose T = 100 and N = 3. Player 1 might decide to evenly distribute the troops over the fronts. Player 2 may then think that this is the obvious thing for player 1 to do, and put 40 troops on front 1 and 2, and put 20 troops on front 3. Player 2 will then win 2 fronts and player 1 will win 1 front so player 2 will win the game.

Equally allocating troops therefore doesn't make sense because somebody can beat it so that you need a better strategy. The problem is that any strategy can be defeated. For example, if player 2 does 40-40-20, then player 1 might do 20-50-30 and defeat player 2. However 20-50-30 can be defeated by 40-20-40. Every strategy can be beaten so there is no real obvious best strategy to play.

Blotto strategies
Blotto strategies
Another insight is that a player doesn't need all troops to win. For example, 40-40-20 can be beaten with 62 troops in 0-41-21.

The character of the game is that there will be equilibrium when people choose their strategy randomly. This is called a mixed strategy equilibrium. The triangle in the graph shows how the troops can be allocated. Putting nearly all troops on one front is a bad idea.

The hexagon in the centre shows the viable strategies. Randomly picking strategies out of the hexagon cannot be beaten in the long run. The expected result or equilibrium is zero, which means an equal number of wins and losses. The winner of each individual game is random.

Is winning at Blotto skill or is it luck? If the number of troops is the ability of a player, then more ability gives the player a better chance of winning. If allocating those troops is strategic, then an increase in strategic ability may enable a player to beat the other sometimes. But if they are playing a mixed equilibrium strategy, then there isn't any strategic ability, and it all comes down to luck.

With really smart players with equal numbers of troops, Blotto probably is luck. If one player is smarter than the other, or one player has more troops than the other, then the outcome depends more on skill. In basic Blotto, where no party has an advantage over the other, anything can be beaten, and it really comes down to confusing the the other player, hence to act randomly.

16.3. Applications of Colonel Blotto

One of the reasons to construct models is fertility. The Colonel Blotto game can be used to understand other forms of competition. For, example, to become President of the United States, you don't have to win the popular vote. You have to do is win the most votes in the US electoral college. For that, you need to win states, hence you can think of each state as a front.

Electoral college
Electoral college
The candidates have to decide where to allocate their troops, in this case troops are money. This money can be used for hiring volunteers, airing television ads, and running rallies. Spending personal time can also be seen as allocating troops.

For example, California has the most votes, but a Republican candidate doesn't stand a chance there. Since the Republicans know that Democrats will win there, neither candidate campaigns there, so they don't put any troops there.

In some elections, like the election between Bush and Gore, where George Bush put a lot of troops in Tennessee, which was Gore's home state. Gore hadn't put many troops in Tennessee because he thought that he would win it for sure. Bush then won Tennessee, which cost Gore the election. The Electoral College game is a lot like the game of Blotto. If you have more resources, and you strategically outsmart the other person, and figure out where to best put the troops, you can win.

Terrorism is another example. There are many places where a terrorist organisation could attack. The government has to put resources in different places like airports, train stations and bus stations, to prevent terrorist activities from taking place. In this case the terrorists might need to win only on one front. That seems easy but the government has a lot more resources then the terrorist group has. In fact, if the government knows exactly how many resources the terrorist group has, they can put exactly enough resources on each front to prevent them from being successful.

You could also apply it to trials. Suppose there are two lawyers of opposing parties, or alternatively, the prosecution and the defence. The fronts could be lines of defence or lines of prosecution. The lawyers have to decide how much time they are going to spend on each of those lines of defence or prosecution. If a lawyer doesn't spend much time on such a line, and the opponent does, then he or she could lose that argument. And whoever wins the most arguments, may win the case.

Hiring often looks a lot like Blotto. For example, if there are two applicants, their features such as education, work experience and social skills, can be seen as fronts. The winner isn't necessarily the one that is better. It could be the person who happens to have the right abilities, or the troops on the right fronts. The same applies to sports. A game like tennis or boxing is played on different fronts or dimensions. The player's ability consists of the skills on each of those fronts. Because of the different abilities, A could beat B, B could beat C and C could beat A.

16.4. Blotto: troop advantages

It is fairly clear that if a player has more troops than than the other, this is a big advantage. Suppose there are three fronts. Player 1 has 180 troops while player 2 has 100. If player 1 evenly distributes the troops and goes 60-60-60, player 2 can win one front at best. As the number of fronts increases, a player needs a larger relative resource advantage to guarantee victory. The advantage of having more troops decreases if there are more fronts.

For example, if player A has 150 troops, and player B has 96 troops, player A can go 50-50-50, and there's nothing player B can do, because at best player B can do 48-48-0, so that he can't win. If there are five fronts, player A can do 30-30-30-30-30, and player B can do 32-32-32-0-0, and win.

In case of a resource disadvantage, it can be better to create more fronts. So if you are the weaker player, you could add dimensions to the fight. In a sporting event, a player may add trick plays. In the business world, the weaker competitor may add a new dimension to the product. A terrorist organisation may want to attack in places where it is least expected. One of the key insights of the study of war and insurgencies is that the weaker parties tend to add new dimensions to the fight.
  front 1 front 2 front 3 front 4 front 5
player 1 4 6 2 6 2
player 2 5 0 8 7 0
player 3 6 5 1 5 3
Three players, five fronts

When player 1 plays against player 2, player 2 wins. When player 2 plays against player 3, player 3 wins. When player 3 plays against player 1, player 1 wins. In multiplayer Blotto, there can be quite a few cycles. That might not be expected based on the skill luck model. The Blotto cycles would there almost all the time.

16.5. Blotto and competition

The Colonel Blotto game is a way to analyse competition, where there are multiple fronts, and the idea is to strategically mismatch your opponent. There are two different types of competition:
- (1) competition for market share where one firm's gain is the other firm's loss;
- (2) win-loss records in sports competitions.

There are different models to make sense of that competition:
- (1) the pure random model where performance is random;
- (2) the skill plus luck model;
- (3) the finite memory random model;
- (4) the Blotto model.

These are all models of competition. These models all show different things aspects of competition. We can think about which one of those fits a particular real world situation best. If performance is random then we should we expect:
- (1) equal wins or regression to the mean;
- (2) no time dependency so that somebody who did well one period shouldn't necessarily do well in the next period.

The world of investment people and mutual funds looks a lot like this. There isn't a lot of time dependency. Who won last year doesn't determine who wins this year.

In the skill plus luck model, we would expect to see:
- (1) unequal wins as some people would be consistent winners;
- (2) semi-consistent rankings as people with higher ability would be better, combined with the paradox of skill where people who are close in ability switch places;
- (3) little time dependency as performance during the last period wouldn't have a big influence on this period;

This may be a decent model for industry market shares or some sports competitions, but other models may work as well.

In the finite memory random walk model, we would expect to see:
- (1) unequal wins;
- (2) semi-consistent rankings because if you happen to sequence of good draws in a row, you are likely to continue to get those;
- (3) time dependency, because the outcome depend on outcomes in the previous times;
- (4) movement from top to bottom and regression to the mean because the values that made performance good at a certain point in time will not be there in the future, which is unlike the skill plus luck model where skill is a kind of stabiliser.

In the Blotto game with equal troops, we would expect to see:
- (1) outcomes that are random;
- (2) lots of manoeuvring.

In the Blotto game with unequal troops, we would expect to see:
- (1) outcomes similar to the skill plus luck model, because a play that has more troops is more likely to win;
- (2) lots of manoeuvring, so that sometimes the inferior party may win.

This may apply to American football, where there is a salary cap, so teams can only spend so much on their players. Teams with better management happen to have better players so that the outcomes may look a lot like the skill plus luck model. There is a lot of manoeuvring in professional football, which suggests that there may be a Blotto like character to the game, in the sense of getting players that match up well against the strengths and weaknesses of opponents.

Blotto with unequal troops and limited movement means that troops can't be reallocated every period and that resources can be traded with others like in a football league or the hiring of a firm. In the Blotto game with unequal troops and limited movement, we would expect to see:
- (1) outcomes similar to the finite memory random walk model;
- (2) lots of manoeuvring;
- (3) cycles where A beats B, B beats C, and C beats A.

In general the differences between a Blotto game and a skill luck game are:
- (1) dimensionality, so when players are making high dimensional strategic decisions, it is more like Blotto;
- (2) zero sum, so if actions are only good relative to other actions, it is more like Blotto, while in a skill luck game both can get better.

High dimensional sports like football, may be more like Blotto, while 100 metre dash, and marathon running may be more skill luck. It is also possible to apply this to the presidential elections United States. You can think of what these different models tell us and which one makes the most sense:
- random: the outcome depends on the economy and economic shocks determine the outcome;
- skill luck: candidates vary in their experience and skills to communicate, but there are also random events, and both determine the outcome;
- random walk: the outcome depends on a sequence of random events such as what is happening in the economy, social movements at the time, international developments and domestic developments;
- Blotto: the outcome depends on the allocation of resources across fronts.

All these models capture some aspects of what is going on during a presidential election. By having all these different models, we will get a deeper understanding of the factors that determine the outcome of an election.

17. Prisoners' Dilemma and Collective Action

17.1. Intro: the prisoners' dilemma and collective action

The most famous game in game theory is the prisoner's dilemma. It is a very simple model of the two by two interaction in which each person has the possibility of being cooperative or defecting. There are two players. Each player has two options. They can either cooperate or they can defect.
Prisoners' dilemma
Prisoners' dilemma

In the example they both get a payoff of 4 if they cooperate. And if they both defect, they both get a payoff of 2. Collectively, they are better off if they cooperate and their worse off if they defect. Individually they are better off if they defect. That creates a tension.

The prisoner's dilemma can be applied in lots of settings such as arms races, price competition among firms, decisions about whether to adopt a new technology or not, attacking opponents in political campaigns versus staying positive, food sharing, and hedonic treadmills, where people buy things just to keep up with their neighbours.

There are ways to overcome the prisoner's dilemma. There are seven ways to get cooperation in the prisoner's dilemma called Cooperation Times Seven. Five of them come from the natural world. These are ways in which different species and even parts of bodies learn to cooperate with one another. Two of them are ways humans have figured out to get cooperation in the prisoner's dilemma.

Collective action problems are an extension of the prisoner's dilemma. Instead of just involving two people, it's a game that involves a lot of people. In a collective action problem, people must say they can contribute to something? There is an incentive to free ride, and this creates some of the same tensions like those in the prisoner's dilemma.

Common pool resource problems are the inverse of collective action problems in the sense that here there is a limited resource like trees, water or lobster. People have to decide how many of this resource they will pull out. The more they pull out, the more they are defecting. Everyone is better off if people pulled out fewer resources, but individually they can't help themselves from doing it.

There is no panacea for solving prisoner's dilemmas like collective action problems and common pool resource problems.

17.2. The prisoners' dilemma game

The prisoner's dilemma is by far the most famous game in game theory. There are only two players. Each player can cooperate or defect. If both players cooperate and plays nice, each one gets a payoff of 4. If player 2 is being nice, and player 1 defects, player 1 will get a payoff of 6. If player 1 is being nice, and player 2 defects, player 2 will get a payoff of 6. Each player has an incentive to defect, but if they both defect, they get a payoff of 2. It is in their collective interest to cooperate. It is in their individual interest to defect. But if they both defect then they are both worse off.
Not a prisoners' dilemma
Not a prisoners' dilemma

It is not a prisoner's dilemma when the total payoff of defecting is higher than cooperating, for instance when both players get 4 when cooperating and one player getting 9 when defecting. If the players alternate in cooperating and defecting, they could get a better payoff than with cooperating all the time. This game is called weak alternation because there is an incentive to alternate as opposed to cooperate.
Formal prisoners' dilemma
Formal prisoners' dilemma

In formal definitions of the prisoner's dilemma, F > T > R, and 2T > F, where F > 0, T > 0 and R >= 0.

The most efficient, outcome is C-C. Pareto efficiency means that there is no way to make every single person better off. This is not only the case for C-C, but also for D-C and C-D, because a person that switches from defecting to cooperating will see a reduction in payoff. The only thing that isn't Pareto efficient is D-D because C-C will make every single person better off.

In terms of Pareto efficiency, only D-D isn't a good outcome. In game theory, and the notion of Nash equilibrium, people optimise and follow strategies. If player 2 cooperates, the best thing player 1 can do, is defecting as 6 is a better outcome than 4. If player 2 defects, the best thing player 1 can do, is also defecting as 2 is a better outcome than 0. The same is true for player 2 acting on player 1. So the Nash Equilibrium here is 2-2.
Self interest game
Self interest game

The incentives produce the worst possible outcome. There is a disconnect between individual and collective preferences. That disconnect is what makes the prisoner's dilemma so interesting.

That is not true for most games. For example, in the self interest game both players have an incentive to do strategy B. Player 1 would do B if player 2 does A and player 1 would do B if player 2 does B. Player 2 would do B if player 1 does A and player 2 would do B if player 1 does B. This gives the best possible outcome and it is also the only Pareto efficient outcome.
Education versus arms control
Education versus arms control

The prisoner's dilemma applies to a lot of settings, for example arms control. Countries can spend money on education or on bombs. Both countries would better off if they were both spending their money on education, but they can't help themselves and spend money on bombs so that everyone is worse off. The same applies to price competition among firms where corporations could cooperate to keep prices high. However, this is not so bad for consumers.

It applies to decisions about whether to adopt a new technology, for example ATM machines for banks. If one banks makes the investment, it could attract more customers, but other banks would follow suit, so that the effect is gone. It may also increase price competition among banks, because people don't need to have an account at the nearest bank.

It applies to attacking opponents in political campaigns versus staying positive, because if both parties go for negative campaigning this may not affect the outcome in the end, but it may tarnish the reputations of the candidates. For both candidates it may seem optimal to go negative, whether the other parties goes negative or not. It also applies to food sharing, and hedonic treadmills, where people buy things just to keep up with their neighbours.

17.3. Seven ways to cooperation

There are seven ways how cooperation can emerge in the prisoner's dilemma where it is in the collective interest to cooperate but in each player's individual interest to defect. A simpler model of this situation is that the total benefit to others b is greater than my cost c or b > c. In his book Super Cooperators Martin Nowak describes five ways in which the prisoner's dilemma has been overcome in nature [16].

These five ways are:
- (1) repetition or direct reciprocity;
- (2) reputation or indirect reciprocity;
- (3) network reciprocity;
- (4) group selection;
- (5) kin selection.

Repetition or direct reciprocity means that the game is played many times so that it is in everyone's interest to cooperate, because if both parties meet next time, they will cooperate again. A simple strategy that can induce cooperation is called tit for tat. And as long as the other keeps cooperating, you cooperate. If that person ever defects, then you defect. This very simple strategy can keep both parties cooperating provided that they meet often enough.

This can be formalised. Suppose p is the probability that both parties meet again, the payoff of defection is 0, the payoff of cooperation would then be pb - c so that you cooperate when pb - c > 0 => p > c/b. For example, in a huge city people are less likely to let someone else with fewer items go ahead in a grocery store because it is less likely that they will meet again so that the favour can be returned.

Reputation or indirect reciprocity works as follows. Instead of directly meeting again, I can tell other people how nice she is so that she gets a reputation. This can be formalised. Suppose q is the probability that her reputation gets out, and the cost of her to letting me go ahead is c, and the benefit of her reputation going out is b, then it is beneficial for her to be nice if bq - c > 0 => q > c/b. The benefit of reputation is indirect. It can create a virtuous cycle of people cooperating with one another.
Network reciprocity B=5, C=2, K=2
Network reciprocity B=5, C=2, K=2
Network reciprocity B=5, C=3, K=2
Network reciprocity B=5, C=3, K=2

Network reciprocity makes it possible for nodes in a network to cooperate with one another. Assume that there is regular graph where every node has the same number of neighbours. If each node has k neighbours, b is the benefit of cooperation, and c is the cost of cooperation, then cooperation is beneficial if k < b/c.

If b = 5, c = 2 and k = 2, it is beneficial to cooperate even if one of your neighbours is not cooperating. In this case k = 2 < b/c = 5/2. However, if b = 5, c = 3 and k = 2, this isn't true any more. In this case k = 2 > b/c = 5/3 and cooperation will die out. In first example there is no incentive for defectors to cooperate, so that cooperation will not spread.
Network reciprocity B=5, C=2, K=4
Network reciprocity B=5, C=2, K=4

Suppose now that k = 4. A person is now playing against four people. If this person is defecting and all his friends are defecting then the payoff is 0. If a person is cooperating and all four friends are cooperating, the payoff is 4 * 5 - 4 * 2 = 12. If one friend is defecting, the payoff will be 7.

However, a defector with three friends cooperating will see a payoff of 15. This means that a person that has one friend defecting is likely to defect. With two friends defecting, the payoff of defecting is 10. So, when the benefits of cooperating are 5 and the costs are 2, this person wants to defect. As the network is more connected, there are more incentives to defect because k increases as k < b/c.

In general, if you have k neighbours that are cooperating, your payoff of cooperating is k(b-c), and your payoff of being a boundary defector is (k-1)b. So, for you to cooperate, you need k(b-c) > (k-1)b => b > kc => b/c > k. With regard to reputation, a denser network is better because reputation is more likely to spread. With regard to network reciprocity, a less dense network is better because there will be less incentive to defect.

Group selection refers to the selection of the winner is based on group performance and not on the performance of individuals so that groups of cooperators win out. Within the group defectors might do better. Between groups, the group that has the most cooperators will win more often. For example, in a war the group that cooperates best probably has more food and better technology. If there is enough competition between groups, there is a force towards cooperation.

Kin selection means that different members of a species have different amounts of relatedness and people tend to care about their next of kin. Formally, players can be related and they care about other people based on their relatedness r so that they will cooperate when rb > c. For a child, that relatedness would be a 1/2 genetically, so if something could benefit my child 10 while it costs me 2, then based on this calculation my benefit would be 5. This model has been used a lot in ecology, because there are some species like ants and bees, where r is very high, so it is not surprising that there is a lot of cooperation within those species.

There are two additional ways to get cooperation in human societies:
- (1) laws and prohibitions or by making things illegal;
- (2) incentives.

Laws and prohibitions can enforce cooperation. For example, it might be someone's interest to talk on the cell phone when driving, but it is not in the interest of society, because it increases the probability that somebody else is going to get injured. Incentives can induce people to cooperate. For example, a community could impose a fine on people that don't shovel the sidewalk before their home after snowfall, even though it isn't illegal not to shovel the sidewalk.

17.4. Collective action and common resource pool problems

The prisoner's dilemma is a two player game. More generally, there can be a N person prisoner's dilemma, where lots of people benefit from the cooperation of one person. In the collective action problem one person can make the choice to contribute or not. This can be formalised. Let xj be the cost of action of person j, where xj could be between 0 and 1, so xj ϵ [10, 1]. The benefit would be proportional to the sum of all individual contributions xi of people in society. As there are N people, the payoff to person j would be -xj+ βΣi=1..Nxi, where β ϵ (0, 1).

Collective action problem
Collective action problem
If β > 1 then it would be in my best interest to cooperate so there would be no collective action problem. Each individual's collective benefit from cooperating must be less than the individual cost so that β < 1. For example, assume there are 10 people and β = 0.6 so that the payoff to person j would be -xj+ 0.6*Σi=1..10xi. If all others are cooperating with the full effort of 1, the payoff of not cooperating would be 0 + 0.6 * 9 = 5.4. The payoff of cooperating would be -1 + 0.6 * 10 = 5.

For example, it is in our collective interest to reduce carbon emissions, but it is costly for people, corporations and countries to do that, so carbon emissions rose over time. The collective action problem is sometimes called the free rider problem because it is in each individual's interest to free ride off the effort of everyone else.

Common pool resource problem
Common pool resource problem
Another type of multiple person prisoner's dilemma is the common pool resource problem where there is some resource like cod in the ocean. And if we fish those too much, the population gets smaller as the population can't reproduce itself fast enough. It is better to manage that resource so that it has the possibility of reproducing itself.

This can be formalised. Suppose each person decides how much cod to eat. Let xj be the amount of cod consumed by person j, let Ct be the total amount available in period t, and let Xt be the total amount consumed in period t. Assume further that the amount available next period will be Ct+1 = (Ct - Xt. If the amount of cod available in period 1 is 25, and the amount consumed in period 1 is 20, there will be 25 in period 2. This is a stable equilibrium. If the amount consumed is 21, there will be 16 in the next period. This means that there will be not enough cod in period 2.

There have been lots of cases of this happening. Jared Diamond wrote about it in his book called Collapse. By over fishing and over consumption and a resource goes away. In the case of Easter Island the population was growing bigger and bigger until there was a collapse of society when the forests were completely destroyed. They were harvesting too much wood, and the forests couldn't reproduce themselves. This led to a collapse of society [17].

17.5. No panacea

Models like prisoner's dilemma, collective action problems and common pool resource problems can help, but to solve these problems, more information about the real world is needed to understand the problem. In the collective action problem, we want to make sure that people are going to contribute in some way. In the common pool resource problem, we want to make sure that people aren't going to over harvest the resource. Particulars matter.

Cattle grazing on a common is a standard common pool resource problem. To prevent over grazing, there could be a rotation scheme where different people can graze their cattle on different days. The cattle has to be marked to make clear who owns it.

For lobster fishing rotating doesn't work because there is a fundamental difference between cattle grazing on a common and harvesting lobster. In the case of cattle grazing, it is clear how much grass has been eaten and how much there is left. In the lobster fishing case, the harvest of an individual is a random variable, and it is difficult to gauge the overall population. There must be mechanisms to monitor the total population of lobster.

Another example is drawing water from a stream. This issue is different because it is asymmetric. People at the head of the stream can draw water out and this affects everybody down the stream. When thinking about the mechanisms needed to induce cooperation in this setting, the most attention must be given to people at the head on the stream.

18. Mechanism Design

18.1. Mechanism design

On reason to model, is to design better institutions or mechanism design. The idea behind mechanism design is to formalise an institution. This is done by deciding what set of actions people can take, and what are the payoffs or the outcomes associated with those actions. In a sense this is defining the rules of the game: the actions people can take, and what the payoffs are going to be.

In constructing these mechanisms, two features often arise in a social context that may need to be overcome through the proper writing of a mechanism:
- (1) hidden actions, most notably in work environments you don't see the actions people take, but only the outcomes, so that you would like to write employment contracts in such a way that people take the actions that you want;
- (2) hidden information, for example, someone could have a high ability or a low ability, or someone could be a risky driver or a safe driver, and you would like to write a contract so that these people are separated out.

The idea is to figure out how to run institutions to overcome these two problems. This can be applied to real world examples like auctions and public goods. There are a lot of different ways in which things can be auctioned off, such as ascending bid auctions, sealed bid auctions, or second price auctions. These different rules are mechanisms. Models can be used to figure out which of those auction rules will work best.

Public goods are like a collective action problem. Everybody benefits from them, but you would like everybody else to pay for them. Examples are roads, clean air and parks. It is possible to use mechanism design to come up with solutions to public good problems.

There are three basic ways of modelling people, which are the rational choice model, a psychological model, and rule based models. The standard way of doing mechanism design is to assume that people are rational. There are some differences when people have physiological biases or when they just follow simple rules.

18.2. Hidden action and hidden information

Mechanism design is designing incentive structures so that we get the outcomes we want by trying to induce people into taking the right kinds of effort. For example, an employer would like to write a contract so that people put forth a lot of effort in their work as opposed to slacking off. Alternatively, if someone is auctioning something off, he or she would like people to reveal their information about how much they value something. So, revelation of information is another feature to construct mechanisms for. Mechanism design is about dealing with hidden actions and hidden information.

Pottery example
Pottery example
Hidden actions are called moral hazard problems because if employees are not being watched, they could cheat or slack off. So the question is, how do people write contracts to overcome these moral hazard problems? For example, in a pottery, employees may make an effort or slack off. The outcome can be either good or bad. If they make an effort the probability of a good outcome is 1. If they slack off, the probability of a good outcome is p so that workers may get away by making no effort. The cost of making the effort is c so that workers prefer to slack off.

The employer wants to induce this effort level of 1 by writing a contract so that they put forth the right level of effort. This is not very complicated. The contract can state that the worker is paid an amount of money M if the pot is good and 0 if the pot is bad.

Incentive compatibility means that it makes sense for the worker to put in the effort. This is incentive compatible if M is sufficient. The payoff for making an effort of 1 is M - c, and for effort 0 the payoff is pM. Incentive compatibility means that M - c >= pM => (1-p)M >= c => M >= c/(1-p).

Comparative statics is the comparison of two different outcomes, before and after a change in some underlying parameter. The equation M >= c/(1-p) tells us that if c or p goes up, M needs to go up. If M > c/(1-p) then the action is not hidden but known.

Hidden information or information asymmetry means that that in a contract the two parties of the contract do not have the same information. This issue is related to hidden action or moral hazard. For example, an insurance provider would prefer to have safe drivers and not to have risky drivers or to have risky drivers pay more for their insurance. An employer would like to hire people with a high ability and not those with a low ability. It is often difficult to tell in advance who are the high ability workers and who are the low ability workers.

This can be formalised. Suppose there are two types of workers, high ability workers H and low ability workers L. For high ability workers the cost to make an effort CH is relatively low. For low ability workers the cost to make an effort CL is relatively high so that CH < CL. Suppose the employees get a fixed wage, then an employer could ask prospective employees to work K hours for a pay of M. This is incentive compatible when M > K*CH and M < K*CL.

Comparative statics shows that if M goes up then K goes up as well. If CL goes up then M goes down.

These models are sometimes called costly signalling models because the high ability workers had to signal, by working K hours that they really are high ability. The number of hours K must therefore be K > M/CL so that the low ability workers will not take the job but the high ability workers will. In this way the high ability workers are separated out and the information is no longer hidden.

18.3. Auctions

Auctions are used in a lot of settings. There are different types of auctions, such as ascending bids, where people call out prices and keep bidding up, sealed bid auctions, where people just write down an amount, and second price auctions, which is a sealed bid auction where the highest bidder gets the item but at the price of the second highest bid. From the perspective of the seller, the objective of an auction is to get as much money as possible.
70 60 20
  75 0
  85 0
80 60 20
  75 5
  85 0
90 60 20
  75 5
  85 -5
Second price auction, rational

In ascending-bid auctions individuals call out bids until no one's willing to go bid a higher price, and whoever has bid the highest price, gets the object. There are different models to think about how that would work:
- (1) rational: people bid up to their value and the highest bidder ends up paying the value of the second highest bidder;
- (2) rule following: people will have a heuristic, for example starting with half their value and using fixed increments, and probably will bid up to their value;
- (3) psychological: people might end up bidding in a frenzy and end up bidding above their value because they get the feeling of winning the auction.

In second price auctions individuals submit a bid and the highest bidder gets the object at the second highest price. That might work out as follows:
- (1) rational: people bid up to their value and the highest bidder ends up paying the value of the second highest bidder;
- (2) rule following: people may overbid or underbid somewhat;
- (3) psychological: people may overbid or underbid by a larger margin.

In ascending bid auctions and second price auctions, rational people should bid their true value despite others possibly being irrational. It is not like a race to the bottom game in which a rational person should take into account the irrationality of other people. That can create a general tendency towards people being more rational over time.

Sealed bid auctions are the most complicated. In sealed bid auctions individuals submit bids and the highest bidder gets it at the highest bid. In this setting it makes sense to bid somewhat less. Psychological bidders or following bidders may also bid somewhat less. The strategy of a rational bidder can be formalised. This strategy depends on a lot of things, including the number of other people in the auction.

Let's assume a simple case with two bidders, bidder 1 and bidder 2. The higher bidder 1's bid is, the more likely it is that he gets the object. But bidder 1 also likes to pay as little as possible. Suppose the expected value of bidder 2 is a uniform distribution between $0 and $1, and that she bids her true value. The odds that bidder 1 wins if he bids $0.60 are 60%. This can be formalised.

Suppose bidder 1 has value V = 1 and bids some amount B between $0 and $1. The odds that bidder 1 wins are B. If bidder 1 gets the object, his surplus is V - B. The expected winnings are B(V-B) = BV - B². Set derivative to 0 to get the maximum. Then you get: V - 2B = 0 => B = V/2. In this case the optimal bid is $0.50.

Suppose now that bidder 1 assumes that bidder 2 is also rational and that the expected value of bidder 2 is a uniform distribution between $0 and $1. Then the expected bid is a uniform distribution between $0 and $0.50. The odds that bidder 1 wins are 2B. The expected winnings are 2B(V-B) = 2BV - 2B². And the optimal bid is still V/2. For any rational bidder it is the best to bid half the value. The outcome is that the highest value bidder will get the object at half of his or her value.

In an ascending bid auction or a second price auction, the highest value bidder wins at the second highest value. In a sealed bid auction the highest value bidder wins at half his value, which is also the expected value of the second highest bidder if the value of the second highest bidder is a uniform distribution between zero and the value of the highest bidder.

The outcomes are nearly the same. The highest bidder gets the object at the exact value of the second highest bidder or at the expected value of the second highest bidder. The revenue equivalence theorem states that with rational bidders a wide class of auction mechanisms including sealed bid, second price and ascending bid, produce identical expected outcomes.

We may not have rational bidders. We could have psychological bidders or rule following bidders. For example, if multinational firms are bidding on oil leases, they probably are fairly close to rational, and every auction method is going to give the same revenue. If we care about transparency, for instance we want to know each bidder's true value, then we might choose a sealed bid auction.

In a charity auction people often haven't participated much in auctions before, so they may be suffering from psychological biases or may be following some simple rules that don't make sense. This can affect the outcome of a sealed bid auction or a second price auction, because these auction types are complicated for unsophisticated bidders.

The ascending bid auction makes a lot of sense in that setting, because even if people are biased or rule-following, they probably bid if the current bid is lower than what they value it. No one is going to underbid and not get something they want. In addition, if the objective is to make as much money as possible, there could be some psychological bias of people wanting to win, so that the seller may make more money.

The revenue equivalence theorem states that it doesn't matter which auction mechanism we use if people are rational. But if we think about how people actually behave, we could then start to make some distinctions about what institution to auction things off might work best. In some cases we might want a sealed bid, in some cases we might want ascending bid, and in other cases we might prefer second price.

18.4. Public projects

A public project something everybody can benefit from such as a road or a park. For example, there could be coffee machine in the office that everybody can use. Assume it costs $80 and there are three people. Person 1 is willing to contribute $40. Person 2 is willing to contribute $50. Person three is willing to contribute $30. If you add up all these amounts, you get $110. The problem is that the amounts people are willing to contribute, are private information. The challenge from a design standpoint is to figure out how to reveal that information.

The Pivot Mechanism means that you only have to pay the marginal amount, which is the minimal amount that you would have to pay given other people's bids in order to make the project viable. So if without you, people are willing to pay $60, and it costs $80, then you only have to pay $20. This creates an incentive to tell the truth, because it doesn't hurt you to say how much you really value the project, because you only are going to pay your marginal amount.
sum of
pay net
30 30    
  90 0 40
40 30    
  40 40 0
  45 35 5
  90 0 40
50 30 50 -10
  40 40 0
  45 35 5
  90 0 40
Pivot mechanism

Formally the Clarke-Groves-Vickery pivot mechanism works as follows. Each person claims a value, for example V1, V2, V3. The real values are hidden information, and the idea behind mechanism design is to get them claim their true value. If the sum of those values is bigger than the cost C, so V1 + V2 + V3 > C, then the coffee machine is bought. Person 1 pays the cost minus the values of the other two people, so Max {C - V2 - V3, 0}.

But does this really work out right? Suppose person 1 gives his true value of $40. One possibility is that the sum of the other people's bids is only $30. That only adds up to $70, so that nothing is paid. If the sum of the others is $45 then person 1 only pays $35, so on net he ends up $5 ahead. If the sum of the others is $90, person 1 is not going to pay anything and end up $40 ahead.

If person 1 decides to cheat a little bit, claims a value of only $30, then it is more likely that the coffee machine will not be bought. Compared to the case where person 1 tells the truth, he will be worse off when the sum of the others is $40 or $45. Person now gets nothing while he could have gained $5. If you under claim, the project could not get done in cases where you would like to get it done. Over claiming is also not a good idea. For instance, if person 1 claims $50, the project could get done if the sum of the others is $30, so that person 1 will lose $10 in that situation.

This is incentive compatible as each person has an incentive this mechanism to give his or her true value, provided that they are rational. But there is a problem. If it is applied to all three persons, if the cost is $80, and the true values for person 1, 2, and 3 are $40, $50, and $20, then person 1 pays $10, person 2 pays $20 and person 3 pays $0, so that the total revenue is $30, and there is not enough money for the coffee machine.

A mechanism for public projects should be:
- (1) efficient, which means that the project is only done if it is worthwhile;
- (2) always join so that people are not coerced into doing it;
- (3) incentive compatible, so that people tell the truth;
- (4) balanced,so that the project can be paid in full.

This is impossible. The alternative is to go for second best and sacrifice one of these things. In the pivot mechanism, the restriction of balanced is sacrificed. There are other mechanisms that sacrifice other restrictions. Sometimes, people are coerced to participate.

19. Replicator Dynamics

19.1. Replicator dynamics

Replicator dynamics are used in psychology to model learning, in economics to model populations of people learning, and in ecology to model evolution. The basic idea is that there is a set of N types {1, 2, 3, ..., N}. Each type has a payoff π(i). There is a proportion of each type Pr(i), which means that there are populations of types, and those populations are succeeding at different levels. There is an evolution of that process. The distribution across types change. The types that are doing worse start copying the types that are doing better. That process is called replicacator dynamics.

For example, with regard to learning, the people in the population are trying different strategies. The most used may not have the highest payoff. If people decide on what strategy to use, they may opt for the most common strategy, but rational people would opt for the highest payoff. Replicator dynamics can be used to model situations where both of those dynamics are in play. People copy more common strategies and they are also copying strategies that tend to do better.

That same replicator dynamics model can be used for evolution. Instead of strategies, there can be phenotypes. For example, there could be frogs, then the phenotypes are different lengths of tongues. The more there is of a type, the more likely this type will reproduce. Suppose not that these types have different levels of fitness, the types with a higher level of fitness are more likely to reproduce.

In both cases there is a distribution across the types, and that distribution is changing in response to the payoffs that those different types get. Fisher's fundamental theorem states that the rate of adaptation is proportional to the variation of that population. That contradicts the idea of six sigma. Fisher's fundamental theorem states that more variation is better because it makes it possible to adapt faster. Six sigma states that more variation is worse because it means more errors.

19.2. The replicator equation

The basic idea of replicator dynamics is that there is a set of N types, actions or strategies {1, 2, 3, ..., N}. Each has a payoff π(i). There is a proportion of each of them Pr(i). There are populations of these types, actions or strategies, and those populations are succeeding at different levels. People can learn in different ways. One of them is just copying other people because others may already have figured out how to act. In conformity models like the standing ovation model, people just copy what others do. People may also want to hill climb, and choose actions that have a better payoff.

The model of replicator dynamics is a way of capturing the dynamics of that process. The rationale states that people choose the strategy with the highest payoff. The sociological model is a rule based model where people are just copying the behaviour of others and assuming that they have chosen it for some good reason, so that the odds of copying are based on the proportion of other people choosing the strategy.

Replicator dynamics balances the rational process and the copying process. In the model the weight of strategy i depends on the payoff π(i) and the proportion Pr(i). The weight is π(i)Pr(i). If the proportion Pr(i) is 0 then nobody is thinking of the option, even when the payoff π(i) is very high, so that the weight must be a product of those variables.

Replicator equation
Replicator equation
The replicator the equation is a way of figuring out how many people use the strategy in the next period. The idea is that the probability of using a strategy in period t+1 is the ratio of its weight to the total weight of all the strategies. In this way replicator dynamics shows how the population moves over time as a function of the payoffs and the proportions.

For example, you have strategies {1, 2, 3} with payoffs {2, 4, 5} and proportions {1/3, 1/6, 1/2}. The weights of {1, 2, 3} are therefore {2/3, 2/3, 5/2} so that the total weight is 4/6 + 4/6 + 15/6 = 23/6. The probabilities of {1, 2, 3} in the next period are {4/23, 4/23, 15/23}.

Shake bow payoff
Shake bow payoff
It is possible to model people as rational. Replicator dynamics is an alternative way of modelling people to achieve rational outcomes by learning using a simple rule. This can be applied to games, for example shaking hands versus bowing. Assume that the payoff of two people shaking hands meeting is 2 for both, and two people bowing meeting is 1 for both, while the payoff of one person bowing and one person shaking is 0 for both.

Suppose the game starts out with 50% shakers and 50% bowers. The payoff is (1/2)*2 = 1 for shaking and (1/2)* 1 = 1/2 for bowing. The weight is (1/2)*1 = 1/2 for shaking and (1/2)*(1/2) = 1/4 for bowing. The proportion of shakers in the next period is (1/2)/(1/2 + 1/4) = 2/3 and the proportion of bowers in the next period is (1/4)/(1/2 + 1/4) = 1/3. If we run this a process multiple times, then we end up with only shakers.

SUV compact game
SUV compact game
Replicator dynamics doesn't always produce the optimal outcome. For example, in the SUV/Compact game, you can either drive an SUV or a compact car. If you drive an SUV, your payoff is 2 regardless of what the other does because you feel safe. If you drive a compact car, and run into someone who is driving an SUV, your payoff is 0 because you feel unsafe. If you drive a compact car, and the other person is driving a compact car, your payoff is 3 because you're getting better gas mileage and feel safe.

Replicator dynamics doesn't produce a rational outcome. Assume that the process starts with 50% driving an SUV and 50% driving a compact car. The payoff of driving an SUV is 2 but for driving a compact car it is (1/2)*0 + (1/2)*3 = 1.5. The weight of SUV's is (1/2)*2 = 1. The weight of compact cars is (1/2)*(3/2) = 3/4. In the next period, the probability of someone driving an SUV is 1/(1 + 3/4) = 4/7 and for driving a compact car it is 3/7. In this game people end up driving SUV's.

Keith Bradsher wrote a book, High and Mighty: The Dangerous Rise of the SUV, to explain why people drive big SUVs while it doesn't make sense. He makes the argument that it happened through the evolution of choices that have caused people to be driving SUVs while they would collectively be better off driving the compacts. The picture on the cover suggests that people in small cars feel unsafe because of people driving big cars [18].

Fitness wheel
Fitness wheel

19.3. Fisher's theorem

Replicator dynamics can also be applied in an ecological context. There are different phenotypes of a species, and those phenotypes having different levels of fitness. Replicator dynamics is a way to capture the dynamics of the population of this species. Instead of assuming a payoff to each type, there is a fitness to each type. Using the same logic one can estimate how many of each type will be reproduced into the next population based on the fitness and the proportion of each type.

This can be captured in a fitness wheel. When an animal is choosing a mate, there is a wheel of fortune. There are animals of different types. In the example, there are 4 of type 1, 2 of type 2 and 2 of type 3. The size of the pie an individual gets is proportional to the fitness. Fitter animals have more opportunities to reproduce. The size of the slice is proportional to the fitness. The number of slices represents the number of individuals of that type. This is replicator dynamics.

Replicator landscape
Replicator landscape
Fisher's fundamental theorem gives an insight about the role that variation plays in adaptation. The theorem combines three different models:
- (1) there is no cardinal, meaning that there is a lot of genetic and phenotypic variation in the population;
- (2) rugged landscapes, meaning that when you encode a function, it can be seen as a rugged landscape in which are trying to climb hills, so if you plot different cardinals on a landscape, they have different levels of fitness;
- (3) replicator dynamics can be used to understand the evolution of the species in terms of choosing the cardinals that are higher up on the landscape.

Fisher's theorem states that higher variances make it possible to adapt faster in the sense of climbing the landscape faster. If there is low variation then selective pressure enables only to climb up a little bit while with high variation, it is possible to climb a lot faster. An example can illustrate this.

Let's start with the population that has 1/3 of type 1 people at fitness 3, 1/3 of type 2 at fitness 4 and 1/3 of type 3 at fitness 5. The average fitness is 4. The variation is (3-4)² + (4-4)² + (5-4)² = 2. The weights are 1, 4/3 and 5/3. The proportions in the next period are Pr(1) = 1/(1 + 4/3 + 5/3) = 3/12, Pr(2) = (4/3)/(1 + 4/3 + 5/3) = 4/12, Pr(3) = (5/3)/(1 + 4/3 + 5/3) = 5/12. The new average fitness is 3*(3/12) + 4*(4/12) + 5*(5/12) = 50/12 ≈ 4.167, a fitness gain of 1/6.

In the case of medium variance, with 1/3 at fitness 2, 1/3 at fitness 4 and 1/3 at fitness 6, so that the average is 4 and the variation is 8. The proportions in the next period will be 1/6, 1/3 and 1/2, and the new average will be 56/12 ≈ 4.667, a fitness gain of 4/6. In the case of high variance, with 1/3 at fitness 0, 1/3 at fitness 4 and 1/3 at fitness 8, so that the average is 4 and the variation is 32. The proportions in the next period will be 0, 1/3 and 2/3, and the new average will be 20/3 ≈ 6.667, a fitness gain of 16/6. In every case the gain is proportional to the variation.

Fisher's fundamental theorem states that the change in average fitness due to selection, in the case of replicator dynamics, is proportional to the variation. More variation means more adaptation.

19.4. Variation or six sigma

Fischer's Fundamental Theorem states that the more variation, the faster a species can adapt, so it makes sense to encourage variation. This contradicts six sigma, most notably if the average is on the top of the landscape, so that the highly variant elements have a low fitness. Six sigma works. Atul Gawande wrote a book called The Checklist Manifesto: How to Get Things Right, which shows how you can do a lot better in the medical profession by reducing variation. For example, six sigma techniques helped to realise a massive reduction In the number of infections [19].

There is also a lot of evidence from ecology and biology that more variation means more adaptation. The problem is that they are opposites like proverbs such as you are never too old to learn and you can't teach an old dog new tricks. How do we make sense of these things when they contradict? Here models are useful while proverbs are not because models have assumptions. The no free lunch theorem that states that no algorithm, such as reducing variation or increasing variation is going to work in all settings.

If there is a fixed landscape, and you have figured out the peak, then it is better to use six sigma. If the landscape changes, then there is no possibility of adapting to the new situation with six sigma. The distinction between when to use six sigma and when to use Fisher's fundamental theorem comes down to the nature of the environment. Fisher's fundamental theorem is about ecological environments and dynamic learning environments. If things are fixed and in equilibrium then six sigma is better. Equilibrium processes have a fixed landscape. Complex, periodic and random processes have an adaptive landscape or a dancing landscape.

In a fixed landscape it is better to get to the peak and use six sigma. In a dancing landscape it is better to maintain variation. Atul Gawande's book The Checklist is about things like getting a plane off the ground and cleaning tubes that go into people's bodies. Those problems are fixed and so you want to reduce variation using a checklist so that nobody makes mistakes. Dancing landscapes appear in ecologies or the business world. For example, marketing strategy and product innovation are not checklists because consumer preferences change.

With multiple models, we can determine which models can work in which setting by looking at the assumptions of those models. One of the advantages of becoming a many model thinker is being able to look at the assumptions of different models, and then apply the right models based on the problem.

20. Prediction and the Many Model Thinker

20.1. Prediction

Individuals can make predictions but collections of individuals can also make predictions. The wisdom of crowds is the phenomenon that collections of people can sometimes make incredibly accurate predictions that individuals are not able to on their own. By using categories to make predictions it is possible to reduce variation. The measure captures how much of the variation can be explained.

Linear models can be even better at predicting patterns of variation in data. There are limits to linear models and sometimes it is better to use Markov models. There are many models that can be used to make predictions. Different people use different models so that collectively people might be better at making predictions. The diversity prediction theorem states that many models are better than individual models, even if the individual models are all equally good. The wisdom of crowds effect is caused by the diversity of models by using these categories.

20.2. Linear models

Predictive models enabled us to get estimates of something that is going to happen in the real world. Let's talk about a basic predictive task. Suppose you have to estimate the height of the Saturn Five rocket. Typically you may think that the Saturn Five rocket is like something else. If you think the rocket is like a water tower, you might guess that it is 30 metres high. If you think it is like the Statue of Liberty, you might guess that it's about 120 metres high. If you think it is like the Eiffel Tower, you might guess it's 300 metres high.

Calories sandwich
Calories sandwich
Different people use different categories and therefore make different predictions. The idea that people put things in different categories is captured in the phrase lump to live. People use categories to make sense of the world and to predict. In the example about different food items like apples, pears, cakes, pies, and bananas, they all had different calories. There was a lot of variation in the caloric values. By using categories, such as fruit and dessert, it was possible to make better predictions and reduce variation. The measure captures how much of the variation can be explained by using the model.

There are also linear models like Z = aX + bY + c that can be used to predict. Here Z is called the dependent variable and X and Y are called independent variables. Here X and Y vary and they determine the value of Z. For example, it is possible write a linear model of the calories in a sandwich based on its ingredients and the weights of those ingredients. For example, there might be 50 grammes of cheese on it, and cheese is 20 calories per gramme. You can weigh the ingredients, use a calory estimate per ingredient, and add them up. In the example, the model predicted a total of 665 calories, while in reality it was 683, so that the model was pretty accurate.

20.3. Diversity prediction theorem

People have different models to make predictions, such as categories, linear models, Markov models or any other type of model. These predictions have some level of accuracy that can be expressed in terms of , the proportion of the variation explained. So how can collective wisdom can come from a lot of different models? The diversity prediction theorem relates the wisdom of the crowd to the wisdom of the individual. The crowd's wisdom depends on the individual accuracy and the diversity of the individuals. The question is how much do individual accuracy and diversity matter?
Diversity prediction theorem
Diversity prediction theorem

The diversity prediction theorem states that the crowd's error equals the average error minus the diversity. For example, assume there are three people Amy, Belle and Carlos that make an estimate of how many people will enter the shop today. The predictions are 10, 16 and 25. The average is 17. Assume that the real outcome is 18. The accuracy of the individuals is (10-18)² = 64, (16-18)² = 4, and (25-18)² = 49. The average accuracy of the individuals is (64 + 4 + 49)/3 = 39. The accuracy of the crowd is (17-18)² = 1, which is better than any of the individuals.

Diversity can be used to make sense of this. Diversity is the variation in the predictions, which is the variation of each person's prediction relative to the mean prediction, and not from the true value. The mean prediction was 17. The contributions to the diversity of Amy, Belle and Carlos are (10-17)² = 49, (16-17)² = 1, (25-17)² = 64, so that the diversity is (49 + 1 + 64)/3 = 38. The crowd's error was 1, the average error was 39, and the diversity was 38. The crowd's error in this case equals the average error minus the diversity. But this is just a made up example that proves the point. However it turns out to be always true.

This can be formalised. In the formula c is the crowd's prediction, θ is the true value, si is the prediction of individual i, and n is the number of individuals in the crowd. If you expand all the terms and cancel everything out, you will find this to be a mathematical identity so that it is always true. If there is some kind of bias, then diversity decreases.

In his book, The Wisdom of Crowds, James Surowiecki discusses the 1906 West of England Fat, Stock, and Poultry Exhibition where 787 people guessed the weight of a ox. The crowd had guessed that the ox would weigh 1,197 pounds while it was in fact 1,198 pounds [20]. The diversity prediction theorem holds here as the crowd's error was 0.6, the average error was 2956, and the diversity was 2955.4.

The squared error is 2956. So people miss the mark by about 55. That makes sense because people could probably guess the weight of an ox within about 50 to 60 pounds because an ox is five times the size of a person. If you can guess the weight of a person within about 10 pounds, you can probably guess the weight of a ox to about 50 pounds. The individuals are not geniuses but they are also not crazy. They are not guessing 50,000 pounds. The crowd is wise because individuals are moderately accurate and also diverse. The accuracy plus diversity that makes the crowd do so well.

The crowd error equals the average error minus diversity. For the wisdom of crowds to exist, the crowd error must be small and the average error has to be fairly large, so that the diversity must also be fairly large. That diversity comes from people using different models. Collective predictions lead to accurate crowds when individuals are reasonably accurate and the crowd is reasonably diverse. Madness of crowds occurs when the average error is high and diversity is small. This happens when there is a group of likeminded people who are all wrong.

20.4. The many model thinker

The reasons to study models and to be a many model thinker are:
- (1) to be an intelligent citizen of the world;
- (2) to become a clearer thinker;
- (3) to use and understand data;
- (4) to make better decisions.

Models can help us to become intelligent citizen of the world. Growth models show that countries can have rapid growth rates just by investing in capital, but at some point, when they get to the frontiers of possibilities, they need innovation. Innovation has a doubling effect. The Colonel Blotto made us understand that it makes sense to add new dimensions in some sorts of strategic competition. It helped us to understand some of the tactics people used in war and terrorism. Markov Models helped us understand to understand situations in which history doesn't matter and intervention doesn't help in the long run.

Markov models help us to become clearer and better thinkers. Other examples of that are discussions of tipping points. A kink in a graph doesn't have to be a tip. It could be an exponential growth model. A tip is a situation where, the likelihood of different outcomes drastically changes at a point in time. There is also a difference between tipping points and path dependence. Path dependence means gradual changes in what is going to happen as events unfold. Some models are linear, some are exponential, and others are s-shaped. Different models give different ways of understanding why something happens.

Models can be used to understand data. There are category models, linear models, and prediction models like the wisdom of crowds. It is even possible to use Markov models and link them to data. Some models like the game of life are abstract, and do not relate to data. Many other models like the growth models, and the linear models do relate to existing data so that they can help to make sense of the information that is out there.

Models can help us to decide, strategise and design. There are game theory models like the prisoner's dilemma, collective action problems, and mechanism design. Mechanism design can be used to construct models of a particular situation to help with designing institutions, write contracts, and design policies, so that we get the desired outcomes. Incentive compatibility can make people take the right actions or reveal their hidden information.

The most important thing about model thinking is how people behave. There are three different models, which people being rational, people following rules, and people having psychological biases. In some cases the assumptions made on the behaviour has huge effects on the outcomes, for example in the race to the bottom, and in other cases like exchange markets, it doesn't matter at all.

By using models it could be discovered that things don't always aggregate the way you would expect. When aggregating, all sorts of interesting things can happen. In particular, systems can go to equilibria, systems can produce patterns, systems can get almost random, and systems can be complex. By constructing models, it became clear why different systems produce different kinds of outcomes, how we can intervene these systems, what actions people are likely to take in these systems, or how events are likely to unfold?


1. Superforecasting: The Art and Science of Prediction, Philip E. Tetlock and Dan Gardner, 2015, Crown Publishers
2. The Big Sort: Why the Clustering of Like-Minded America is Tearing Us Apart, Bill Bishop, 2008, Houghton Mifflin Harcourt
3. Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Nicholas A. Christakis and James H. Fowler, 2009, Hachette Book Group
4. A New Kind of Science, Stephen Wolfram, 2002, Wolfram Media
5. Thinking, Fast and Slow, Daniel Kahneman, 2011, Farrar, Straus and Giroux
6. Nudge: Improving Decisions About Health, Wealth and Happiness, Cass R. Sunstein and Richard H. Thaler, 2009, Penguin
7. The Tipping Point: How Little Things Can Make a Big Difference, Malcolm Gladwell, 2002, Back Bay Books
8. Why Nations Fail: The Origins of Power, Prosperity, and Poverty, Daron Acemoglu and James Robinson, 2012, Crown Publishers
9. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies, Scott E. Page, 2007, Princeton University Press
10. The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change, Stephen Covey, 2004, Simon & Schuster
11. The Gifts of Athena: Historical Origins of the Knowledge Economy, Joel Mokyr, 2002, Princeton University Press
12. Networks. An Introduction, Mark Newman, 2010, Oxford University Press
13. The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing, Michael Mauboussin, 2012, Harvard Business School Publishing
14. Good to Great: Why Some Companies Make the Leap and Others Don't, Jim Collins, 2001, Harper Business
15. A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing, Burton Malkiel 16. Super Cooperators : Altruism, Evolution and Mathematics (or, Why We Need Each Other to Succeed), Martin Nowak and Roger Highfield, 2011, The Text Publishing Company
17. Collapse: How Societies Choose to Fail or Succeed, Jared Diamond, 2004, Penguin Books Ltd. (London)
18. High and Mighty: The Dangerous Rise of the SUV, Keith Bradsher, 2002, Public Affairs
19. The Checklist Manifesto: How to Get Things Right, Atul Gawande, 2009, Metropolitan Books
20. The Wisdom of Crowds, James Surowiecki, 2004, Anchor Books