866458aa16409bcd7da89a51ff20722d.ppt
- Количество слайдов: 8
xkcd. com IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine
Group Mentor: Dr. Michael L. Littman Chair of the Computer Science Dept. Specializing in AI and Reinforcement Learning Grad Student Mentor: Michael Wunder Ph. D Student studying with Dr. Littman
Game Theory Study of interactions of rational utility-maximizing agents and prediction of their behavior An action profile is a Nash Equilibrium of a game if every player’s action is a best response to the other players actions. Normal Form Game a c e g b d f h Column B A Row A a, b c, d B e, f g, h
Example Spoiled Child Game Analysis Child Behave Spoil Parent Misbehave 1, 2 0, 1 Parent’s intent to play towards Nash Equilibrium outcome: 0, 3 Punish Let Child be Reinforcement Learner 2, 0 (1/2)Spoil & (1/2)Punish 1. 5 Child’s intent to play towards Nash Equilibrium outcome: (2/3)Behave & (1/3) Misbehave 0. 667
Reinforcement Learning Def: Sub area of machine learning concerned with how an agent ought to take actions so as to maximize some notion of long term reward. Michael Wunder, Michael Littman, and Monica Babes Classes of Multiagent Q-learning Dynamics with epsilon -greedy Exploration.
Q-Learning Assign arbitrary Q-values to each strategy A and B. Will refer to these values Q(A) as Q(B) respectively. Q(action) =(1 -α) Q(action) + αR -greedy exploration: a probability the Q-learner will choose a random action. With
Goals Understand the behavior of the Q-learning algorithm in games with more actions, more players, or more states. Try to formalize the notion of "value based equilibria". Develop new algorithms that learn effectively in a wide variety of games. Find a machine learner that elicits different behavior from different learning agents for possible use in diagnosing how people and monkeys learn.
Importance The internet serves as a place where learning robots can serve as a proxy for human interaction Its use could be effective in auctions, making online purchases, tracking goods, or even playing online poker Learning the state that results from interactions of AI can lead us to predict the long-term value of these interactions A successful algorithm may prove conducive to the understanding of the brain’s ability to learn
866458aa16409bcd7da89a51ff20722d.ppt