ANITI’s first Reinforcement Learning Virtual School

My notes
reinforcement learning
Published

April 1, 2021

https://rlvs.aniti.fr/

Schedule is

RLVS schedule

This condensed schedule does not include class breaks and social events. Times are Central European Summer Time (UTC+2).

Schedule
March 25th 9:00-9:10 Opening remarks S. Gerchinovitz
9:10-9:30 RLVS Overview E. Rachelson
9:30-13:00 RL fundamentals E. Rachelson
14:00-16:00 Introduction to Deep Learning D. Wilson
16:30-17:30 Reward Processing Biases in Humans and RL Agents I. Rish
17:45-18:45 Introduction to Hierarchical Reinforcement Learning D. Precup
March 26th 10:00-12:00 Stochastic bandits T. Lattimore
14:00-16:00 Monte Carlo Tree Search T. Lattimore
16:30-17:30 Multi-armed bandits in clinical trials D. A. Berry
April 1st 9:00-15:00 Deep Q-Networks and its variants B. Piot, C. Tallec
15:15-16:15 Regularized MDPs M. Geist
16:30-17:30 Regret bounds of model-based reinforcement learning M. Wang
April 2nd 9:00-12:30 Policy Gradients and Actor Critic methods O. Sigaud
14:00-15:00 Pitfalls in Policy Gradient methods O. Sigaud
15:30-17:30 Exploration in Deep RL M. Pirotta
April 8th 9:00-11:00 Evolutionary Reinforcement Learning D. Wilson, J.-B. Mouret
11:30-12:30 Evolving Agents that Learn More Like Animals S. Risi
14:00-16:00 Micro-data Policy Search K. Chatzilygeroudis, J.-B. Mouret
16:30-17:30 Efficient Motor Skills Learning in Robotics D. Lee
April 9th 9:00-13:00 RL tips and tricks A. Raffin
14:30-15:30 Symbolic representations and reinforcement learning M. Garnelo
15:45-16:45 Leveraging model-learning for extreme generalization L. P. Kaelbling
17:00-18:00 RLVS wrap-up E. Rachelson

(4/1/21) - Deep Q-Networks and its variants

Speaker is Bilal Piot.

Deep Q network as a solution for a practicable control theory.

Introduction of ALE (Atari Learning Environment)

DQN is (almost) end-to-end: from raw observations to actions. Bilal explains the preprocessing part (from 160x210x3 to 84x84 + stacking 4 frames + downsampling to 15 Hz)

Value Iteration (VI) algorithm: Recurrent algorithm to get Q. \(Q_{k+1}=T^*Q\)

But it is not practical in a real-world case. What we can do is use interactions with real world. And estimate \(Q^*\) using a regression.

Would be interesting to have slides. I like the link between regression notations and VI notation.

From neural Fitted-\(Q\) to DQN. Main difference is data collection (in DQN you have updated interactions and it allows exploration, and size of architecture)

With DQN we have acting part and learning part. Acting is the data collection. (using \(\epsilon\)-greedy policy)

hands-on based on DQN tutorial notebook.

had to export LD_LIBRARY_PATH=/home/explore/miniconda3/envs/aniti/lib/

Nice introduction to JAX and haiku. Haiku is similar modules in pytorch and can turn NN into pure version. Which is useful for Jax.

overview of the literature

(4/2/21) - From Policy Gradients to Actor Critic methods

Olivier Sigaud is the speaker.

He has pre-recorded his lecture in videos. I have missed the start so I will have to watch them later.

Policy Gradient in pratice

Don’t become an alchemist ;)

As stochastic policies, squashed gaussian is interesting because it allows continuous variable + bounds.

Exploration in Deep RL

(4/8/21) - Evolutionary Reinforcement Learning

pdf version of the slides are available here

then Evolving Agents that Learn More Like Animals

This morning was more about what we can do when we have infinite calculation power and data.

Afternoon will be the opposite.

(4/9/21) - RL in Practice: Tips and Tricks and Practical Session With Stable-Baselines3

Abstract: The aim of the session is to help you do reinforcement learning experiments. The first part covers general advice about RL, tips and tricks and details three examples where RL was applied on real robots. The second part will be a practical session using the Stable-Baselines3 library.

Pre-requisites: Python programming, RL basics, (recommended: Google account for the practical session in order to use Google Colab).

Additional material: Website: https://github.com/DLR-RM/stable-baselines3 Doc: https://stable-baselines3.readthedocs.io/en/master/

Outline: Part I: RL Tips and Tricks / The Challenges of Applying RL to Real Robots

  1. Introduction (3 minutes)
  2. RL Tips and tricks (45 minutes)
    1. General Nuts and Bolts of RL experimentation (10 minutes)
    2. RL in practice on a custom task (custom environment) (30 minutes)
    3. Questions? (5 minutes)
  3. The Challenges of Applying RL to Real Robots (45 minutes)
    1. Learning to control an elastic robot - DLR David Neck Example (15 minutes)
    2. Learning to drive in minutes and learning to race in hours - Virtual and real racing car (15 minutes)
    3. Learning to walk with an elastic quadruped robot - DLR bert example (10 minutes)
    4. Questions? (5 minutes+)

Part II: Practical Session with Stable-Baselines3

  1. Stable-Baselines3 Overview (20 minutes)
  2. Questions? (5 minutes)
  3. Practical Session - Code along (1h+)

action space

When using continuous space, you need to normalize! (normalized action space -1, -1)

there is a checker for that in stable baselines 3.

reward

start with reward shaping.

termination condition

early stopping makes learning faster (and safer for robots)

for hyperparameter tuning, Antonin recommends Optuna.

about the Henderson paper: Deep Reinforcement Learning that Matters

and then the controller will use latent representation / current speed + history as observation space.

Learning to drive takes then 10 min, and to race 2 hours.

handson

slides: https://araffin.github.io/slides/rlvs-sb3-handson/

notebook: https://github.com/araffin/rl-handson-rlvs21

RL zoo: https://github.com/DLR-RM/rl-baselines3-zoo

documentation for SB3 usefull for completing exercises: https://stable-baselines3.readthedocs.io/en/master/

https://excalidraw.com/