Live von der MLcon 2019

LIVESTREAM: „Human / AI Interaction Training: Ein neuer Ansatz für Reinforcement Learning“ – Machine Learning Conference 2019
Keine Kommentare

Die Machine Learning Conference 2019 in Berlin bringt KI-, Deeplearning- und Voice-Experten zusammen, um die neuesten Trends und Praxiserfahrungen aus dem Bereich des maschinellen Lernens zu diskutieren. Wir sind vor Ort dabei und streamen die Keynotes live!

Human / AI Interaction Loop Training as a new Approach for interactive Learning with Reinforcement-Learning Agents

Die Mittwochs-Keynote wird von Dr. Neda Navidi gehalten. Die ML-Expertin präsentiert neueste Forschungsergebnisse aus der Kombination von menschlichen Interaktionen mit Reinforcement-Learning-Systemen.

Mittwoch, 11. Dezember, 13:30 – 14:00 Uhr


Session abstract:

Human / AI interaction loop training as a new approach for interactive learning with reinforcement-learning: Reinforcement-Learning (RL) in various decision-making tasks of Machine-Learning (ML) provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. This complexity, coming from high dimensionality and continuousness of the environments considered herein, calls for a large number of learning trials to learn about the environment through RL. Imitation-Learning (IL) offers a promising solution for those challenges, using a teacher’s feedback. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. In this study, we considered a human teacher, and an agent learner. The teacher takes part in the agent’s training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. Within that paradigm, however, existing IL approaches have the drawback of expecting extensive demonstration information in long-horizon problems. With this work, we propose a novel approach combining IL with different types of RL methods, namely State-action-reward-state-action (SARSA) and Proximal Policy Optimization (PPO), to take advantage of both IL and RL methods. We address how to effectively leverage the teacher’s feedback – be it direct binary or indirect detailed – for the agent learner to learn sequential decision-making policies. The results of this study on various OpenAI-Gym environments show that this algorithmic method can be incorporated with different RL-IL combinations at different respective levels, leading to significant reductions in both teacher effort and exploration costs.

 

Die Sprecherin

Dr. Neda Navidi has completed her PhD in autonomous driving field from École de Technologie Supérieure (ÉTS), and postdoctoral studies from HEC Montréal, McGill University and Polytechnique Montréal. She has been machine learning (ML) researcher, applied research scientist and data scientist in different research teams. She is also an expert in deep learning, reinforcement learning, supervised / unsupervised learning, natural language processing, computer vision, and time series data. She now works in AI research and development at AI Redefined Inc.
Unsere Redaktion empfiehlt:

Relevante Beiträge

Hinterlasse einen Kommentar

Hinterlasse den ersten Kommentar!

avatar
400
  Subscribe  
Benachrichtige mich zu:
X
- Gib Deinen Standort ein -
- or -