Faculty Advisor or Committee Member

Jie Fu, Advisor

Faculty Advisor or Committee Member

Zhi (Jane) Li, Committee Member

Faculty Advisor or Committee Member

Carlo Pinciroli, Committee Member

Identifier

etd-052818-100711

Abstract

In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable agent. Given a task specification for the controllable agent, its ability to complete the task can be impeded by an inaccurate model of the intent and behaviors of other agents. In this work, we introduce an active policy inference algorithm that allows a controllable agent to infer a policy of the environmental agent through interaction. Active policy inference is data-efficient and is particularly useful when data are time-consuming or costly to obtain. The controllable agent synthesizes an exploration-exploitation policy that incorporates the knowledge learned about the environment's behavior. Whenever possible, the agent also tries to elicit behavior from the other agent to improve the accuracy of the environmental model. This is done by mapping the uncertainty in the environmental model to a bonus reward, which helps elicit the most informative exploration, and allows the controllable agent to return to its main task as fast as possible. Experiments demonstrate the improved sample efficiency of active learning and the convergence of the policy for the controllable agents.

Publisher

Worcester Polytechnic Institute

Degree Name

MS

Department

Robotics Engineering

Project Type

Thesis

Date Accepted

2018-05-01

Accessibility

Unrestricted

Subjects

active learning markov decision process softmax boltzmann policy gradient

Available for download on Tuesday, May 28, 2019

Share

COinS