d76220ba84a5d0139c8e8e337706aa1b.ppt
- Количество слайдов: 28
Modeling the Process of Collaboration and Negotiation with Incomplete Information Katia Sycara, Praveen Paruchuri, Nilanjan Chakraborty Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik
Virtual Humans USC Computational Models Implementation CMU, USC RESEARCH PRODUCTS CMU Identify Cultural Factors validation CUNY, Georgetown, CMU Theory Formation validation Surveys & Interviews Data Analysis CUNY, CMU, U Mich, Georgetown CUNY, Georgetown, U Pitt, CMU Common task Subgroup task Cross-Cultural Interactions U Pitt, CMU MURI 14 Program Review-September 10, 2009 validation Validated Theories Modeling Tools Briefing Materials Scenarios Training Simulations 2
Problem • Computational model of reasoning in Cooperation and Negotiation (C&N) • Capture the rich process of C&N – Not just outcome – Not just offer-counteroffer but additional communications • Account for cultural, social factors • Rewards of other agents not known • Uncertain and dynamic environment MURI 14 Program Review-September 10, 2009 3
Contributions • Created an initial model from real human data. The model: – Applicable in a uniform way to both collaboration and negotiation – Derives sequences of actions for an agent from real transcripts, as opposed to state of the art work where action selection is constructed heuristically – Adapts its beliefs during the course of the interaction – Learns elements of the negotiation (e. g. other party type) as the interaction proceeds – Produces optimal activity sequences considering also the other agents – Has only incomplete information about others MURI 14 Program Review-September 10, 2009 4
POMDP: Partially Observable Markov Decision Process The World (Other agents) Observation Agent Action • • Agent has initial beliefs Agent takes an action Gets an observation Interprets the observation Updates beliefs Decides on an action Repeats Agent takes optimal action considering world/other agents Elements: {States, Actions, Transitions, Rewards, Observations } MURI 14 Program Review-September 10, 2009
Why POMDP based modeling ? – – – – Decentralized algorithm Incorporated in an agent that interacts with others Can represent communication (arguments, offers, preferences etc) Many conversational turns Learns e. g. the model of the other player Adaptive best response Computationally efficient for realistic interactions Extendable to more the two agents Natural way to represent cultural and social factors in C and N MURI 14 Program Review-September 10, 2009 6
Output of POMDP • The output is a policy matrix • Policy: Optimal action to take, given current state (observations and other’s model) • At run-time, agent consults the matrix and takes appropriate action MURI 14 Program Review-September 10, 2009 7
Simplified Example • Two agents negotiating – Seller S (POMDP Agent) – Buyer B (Other player) • Single item negotiation • Initially buyer at 0 price and seller at max = 10 MURI 14 Program Review-September 10, 2009 8
Example: State Space • State composed of 2 parts – – Seller Type, Buyer type – Negotiation status: current offers • Agent types: cooperative or non-cooperative • Negotiation modeled from Seller’s perspective – Initially high uncertainty of Buyer type • Seller’s belief about Buyer, and state of negotiation are dynamic MURI 14 Program Review-September 10, 2009 9
Example: POMDP State • Agent Type: cooperative vs non-cooperative – 0 cooperative, 1 non-cooperative – Discretized to {0, . 5 , 1} • Price discretized to the set {0, 1, . . , 9, 10} • Sample state: Me (Seller) Type= Coop You (Buyer) = Unknown Negotiation status: <S price, =$10; B price=$0> • State space = Number of Buyer types * Negotiation states = 363 MURI 14 Program Review-September 10, 2009 10
Example: Action & Transition • Action set: {Concede 2, Concede 1, Concede 0, Accept, Reject} • Transition: Probability of ending in some state if agent takes a particular action in current state MURI 14 Program Review-September 10, 2009 11
Concede Agree Concede 02 Me = Coop You = Unknown Concede 1 My price = $10 Your price = $0 0. 1 0. 7 Concede 1 0. 5 0. 65 0. 35 0. 2 0. 6 0. 35 0. 05 Me = Coop You = Coop Me = Coop You = Ncoop ( $9, $0 ) ( $9, $1 ) ( $9, $2 ) Concede 1 0. 75 0. 7 Concede 0 0. 25 Concede 2 0. 1 0. 4 0. 5 Me = Coop You = Coop Me = Coop You = Coop ( $8, $0 ) ( $8, $1 ) ( $8, $2 ) ( $7, $0 ) ( $7, $1 ) Concede 0 ( $7, $2 ) ( $4, $6) ( $6, $4) MURI 14 Program(Review-$5, $5) September 10, 2009 12
Building Initial Simplified POMDP • Human negotiation transcripts – 2 players (Grocer and Florist) with 4 issues • Mapped dialogues to 14 base codes (actions) • Other player’s type known for each transcript – Used for training and validation of the model • Transition: Frequency of reaching some state, given a code • Observation: Frequency of observing a code given some negotiation state Program Review-MURI 14 13 September 10, 2009
POMDP construction Grocer-Florist Transcript <Player, Action code> Model Generator (Empty) Learns Model generated Prescription of optimal actions given state of interaction MURI 14 Program Review-September 10, 2009 Reasoning over model 14
Codes used Code Definition REACTIONS OFFER Definition Miscellaneous OS Single-Issue RPO Agreement to offer made SBF Substantiation OM Multi-Issue RPS Agreement with statement Q Question PROVIDE INFORMATION RNO Disagreement with offer PC Procedural Comment IP Issue Preferences RNS Disagreement with statement INT Summarizing IR Priorities TP Threat/Power IB Bottom-line Courtesy of Laurie Weingart MURI 14 Program Review-September 10, 2009 15
Sample Grocer-Florist Transcript • • • • Speaker Code Unit Florist PC So let’s start with temperature Grocer RPS Okay Florist OS So I would suggest a temperature of 64 degrees Grocer RPS Okay Florist Q How does that work for you? Grocer IP Well personally for the grocery I think it is better to have a higher temperature Grocer SBF Just because I want the customers to feel comfortable Grocer SBF And if it is too cold that might turn the customers away a little bit Florist RPS Okay Grocer SBF "And also if it is warm, people are more apt to buy cold drinks to keep themselves comfortable and cool" Florist RPS That's true. Grocer OS I think 66 would be good. Grocer SBF That way it is not too cold and it is not too hot as well. Grocer SBF And its good for the customers. Florist RPO "Okay, yeah" • Assumed Florist is Cooperative MURI 14 Program Review-September 10, 2009 16
Grocer POMDP generated Florist Discuss preferences and support their positions 64 F Me = Coop You = Coop 70 F, 62 F Florist Agrees without committing Me = Coop You = Coop 70 F, 64 F Florist Proposes 66 F Doesn’t commit Agrees to 66 F Me = Coop You = Coop 66 F, 64 F Me = Coop You = Coop Grocer substantiates his offer 66 F, 66 F Reward 60 points for both Grocer and Florist MURI 14 Program Review-September 10, 2009 17
Negotiation Game Agent: (Grocer) Optimal POMDP policy Grocer Action Human (Florist) Florist Action • Sequential • Process oriented • Blends computational and social science results MURI 14 Program Review-September 10, 2009 18
Initial results – Classification of Florist Uncertainty of belief • 10 transcripts for training: 4 cooperatives, 6 noncooperatives • 5 for testing –average of correctly classified • X axis – Number of communications • Y axis – Uncertainty of belief of grocer about florist MURI 14 Program Review-September 10, 2009 19
Modeling Cultural Factors • How do we model cultural factors for C and N in a POMDP? • How do we validate the model? • Is the model general enough to exhibit plausible culturally-specific human behavior? MURI 14 Program Review-September 10, 2009 20
Culture and POMDP • Initial beliefs about others’ social value orientation and behavior usually reflect own culture beliefs about the interaction • Culture influences frequency of particular actions and communications • Interpretation of each observation refines the agent’s model of others • Interpretation is influenced by culture – Model can capture cultural misinterpretations and their consequences in terms of strategy and outcomes • Agents from different cultures can have different rewards for the same actions MURI 14 Program Review-September 10, 2009 21
Other’s type • Includes factors such as: – Social Value Orientation • Pro-Social/cooperative, individualistic, competitive, altruistic – Trust, Reputation etc – Cultural factors • Individualist vs Collectivist • Egalitarian vs Hierarchy • Direct vs Indirect communication MURI 14 Program Review-September 10, 2009 22
A’s interpretation A’s culture of B’s intent A’s history with B A’s real intent A’s schema A’s behavior Reward Actions Transition Context State Space A’s schema B’s schema Initial Beliefs B’s culture B’s history with A Reward B’s behavior B’s schema B’s real intent B’s interpretation of A’s intent Observations Cognitive Schema of A POMDP Context
A’s interpretation A’s culture of B’s intent A’s history with B Context A’s real intent A’s schema A’s behavior Reward Actions Transition Survey experiments State Space A’s schema B’s schema Initial Beliefs B’s culture B’s history with A Reward B’s behavior B’s schema B’s real intent B’s interpretation of A’s intent Observations Observer Experiments Capturing initial state of model Context
A’s interpretation A’s culture of B’s intent A’s history with B A’s real intent A’s schema A’s behavior Reward Actions Transition Context State Space A’s schema B’s schema Initial Beliefs B’s culture B’s history with A Reward B’s behavior B’s schema B’s real intent B’s interpretation of A’s intent Observations Intercultural transcripts Capturing model dynamics Context
Plans for Next Year • Initial beliefs from Observer Experiment and from surveys (US, Turkey, Egypt, Qatar) • Collect intra-cultural negotiation transcripts – US, Turkey, Egypt • Build POMDPs from inter-cultural negotiation transcripts – US-Hong Kong, US-German, US-Israeli (have) (courtesy of Wendi Adair and Jeanne Brett) – US-Turkish, US-Egyptian, US-Qatari (collect) MURI 14 Program Review-September 10, 2009 26
Plans for Next Year • Validate the predictive behavior of the models – Using the transcripts for training and testing • Use the models in negotiation with humans • Use the models in what-if scenarios • Use the models to generate hypotheses to test with human subjects • Initial models for collaboration scenarios using POMDP MURI 14 Program Review-September 10, 2009 27
Thank You Any questions ? MURI 14 Program Review-September 10, 2009 28
d76220ba84a5d0139c8e8e337706aa1b.ppt