🆚Game Theory and Economic Behavior Unit 14 Review

14.3 Learning models in game theory

🆚Game Theory and Economic Behavior
Unit 14 Review

14.3 Learning models in game theory

Written by the Fiveable Content Team • Last updated September 2025

🆚Game Theory and Economic Behavior

Unit & Topic Study Guides

14.1 Evolutionary stable strategies

14.2 Replicator dynamics and population games

14.3 Learning models in game theory

14.4 Applications to biology and social evolution

Learning models in game theory explore how players adapt and improve their strategies over time. These models range from simple reinforcement learning to complex belief-based approaches, each offering unique insights into strategic behavior.

Reinforcement learning rewards successful actions, while belief-based models update expectations about opponents. Social learning and imitation add another layer, showing how strategies spread through populations. These models help explain real-world strategic behavior and decision-making processes.

Reinforcement and Q-Learning Models

Reinforcement Learning Fundamentals

Reinforcement learning involves agents learning optimal strategies through trial and error interactions with their environment
Agents receive rewards or punishments based on the outcomes of their actions and aim to maximize their cumulative reward over time
Key components of reinforcement learning include states, actions, rewards, and a value function that estimates the long-term value of being in a particular state or taking a specific action
Reinforcement learning algorithms update the value function based on the observed rewards and use it to guide the agent's decision-making process (Q-learning, SARSA)

Q-Learning Algorithm

Q-learning is a model-free reinforcement learning algorithm that learns an optimal action-value function, denoted as Q(s,a), representing the expected cumulative reward of taking action a in state s
The Q-function is updated iteratively based on the Bellman equation: $Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$
- $\alpha$ is the learning rate, $\gamma$ is the discount factor, $r$ is the immediate reward, and $s'$ is the next state
Q-learning is an off-policy algorithm, meaning it learns the optimal policy independently of the policy being followed during the learning process
The agent selects actions based on an exploration-exploitation trade-off, typically using strategies like $\epsilon$-greedy or softmax to balance exploring new actions and exploiting the current best action

Experience-Weighted Attraction (EWA) Learning

EWA learning is a hybrid model that combines elements of reinforcement learning and belief-based learning
Each strategy is assigned an attraction value, which is updated based on the payoffs received and the agent's experience
The attraction update rule incorporates both the realized payoff and the forgone payoff of unchosen strategies, weighted by an experience factor
The experience factor determines the relative importance of past and recent experiences in shaping the attraction values
EWA learning can capture both the law of effect (reinforcement) and the power law of practice (experience-based learning) observed in human learning

Belief-Based Learning Models

Fictitious Play

Fictitious play is a belief-based learning model where agents form beliefs about their opponents' strategies based on observed actions
Each agent maintains a belief distribution over the possible strategies of their opponents and best-responds to these beliefs
The belief distribution is updated using empirical frequencies of observed actions, assuming opponents are playing stationary strategies
Fictitious play can converge to Nash equilibria in certain classes of games (zero-sum, potential, and supermodular games) but may fail to converge in general

Best Response Dynamics

Best response dynamics is a learning model where agents repeatedly best-respond to their opponents' current strategies
At each time step, one or more agents update their strategies to the best response given their opponents' current strategies
Best response dynamics can be deterministic (agents always choose the best response) or stochastic (agents choose best responses with high probability)
The convergence properties of best response dynamics depend on the game structure and the specific update rules used (simultaneous or asynchronous updates)

Bayesian Learning

Bayesian learning is a belief-based model where agents update their beliefs about opponents' strategies using Bayes' rule
Agents start with prior beliefs over the possible strategies of their opponents and update these beliefs based on observed actions
The posterior belief distribution is obtained by multiplying the prior beliefs with the likelihood of observed actions, normalized by the total probability
Bayesian learning allows agents to incorporate prior knowledge and handle uncertainty in a principled manner
The convergence properties of Bayesian learning depend on the prior beliefs and the true strategies being played by the opponents

Imitation Learning Mechanisms

Imitation learning involves agents learning by observing and copying the behavior of other agents in the population
Agents may imitate successful strategies based on their payoffs or popularity within the population
Imitation can be random (copying a randomly selected agent) or guided by specific rules (copying the best-performing agent or the most common strategy)
Imitation learning can lead to the spread of successful strategies and the emergence of social norms and conventions
However, imitation can also result in suboptimal outcomes if agents copy inferior strategies or get stuck in local optima

Social learning refers to the process of acquiring knowledge, skills, or behaviors through observation, interaction, or communication with other agents
Social learning can occur through various mechanisms, such as imitation, teaching, or language-mediated learning
Cultural evolution studies how social learning shapes the distribution of behaviors, norms, and beliefs in a population over time
Social learning can facilitate the accumulation of knowledge across generations and the adaptation of populations to changing environments
However, social learning can also lead to the spread of maladaptive behaviors or the persistence of outdated practices (conformity bias, cultural inertia)
The interplay between individual learning, social learning, and environmental factors determines the dynamics of cultural evolution in a population

🆚Game Theory and Economic Behavior Unit 14 Review

14.3 Learning models in game theory

🆚Game Theory and Economic Behavior
Unit 14 Review

14.3 Learning models in game theory

Unit & Topic Study Guides

Reinforcement and Q-Learning Models

Reinforcement Learning Fundamentals

Q-Learning Algorithm

Experience-Weighted Attraction (EWA) Learning

Belief-Based Learning Models

Fictitious Play

Best Response Dynamics

Bayesian Learning

Imitation Learning Mechanisms

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

Study Content & Tools

Company

Resources

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🆚Game Theory and Economic Behavior Unit 14 Review

14.3 Learning models in game theory

🆚Game Theory and Economic Behavior Unit 14 Review

14.3 Learning models in game theory

Unit & Topic Study Guides

Reinforcement and Q-Learning Models

Reinforcement Learning Fundamentals

Q-Learning Algorithm

Experience-Weighted Attraction (EWA) Learning

Belief-Based Learning Models

Fictitious Play

Best Response Dynamics

Bayesian Learning

Imitation and Social Learning

Imitation Learning Mechanisms

Social Learning and Cultural Evolution

history

social science

english & capstone

arts

science

math & computer science

world languages

high school exams

honors classes

college classes

🆚Game Theory and Economic Behavior
Unit 14 Review