Guillaume’s blog - Logbook for September 21

Week 35 - September 21

Thursday 9/2

Paper reviewed on arxiv about Continuous Control With Deep Reinforcement Learning. (Lillicrap et. al - 2015) arXiv:1509.02971. This is about DDPG. Initial paper comes from David Silver: Deterministic policy gradient algorithms in ICML 2014, but is not easy to read. Here is a review from towardsdatascience, in which the Deep Deterministic Policy Gradients (DDPG) is presented, and is written for people who wish to understand the DDPG algorithm.

Week 36 - September 21

Monday 9/6

Install of barrier to share keyboard/mouse between linux and windows. Nice combinaison with KVM usb switch.

Move wsl to another drive with move-wsl

Wednesday 9/8

Creation of custom gym environment and optimization using DQN, then DDPG with stable baselines 3. Takes around 50,000 steps to optimize a ultra simple grid problem… No success with DDPG, something missing?

Thursday 9/9

Still playing with gym and stable baselines 3. A2C, PPO and SAC are working but DDPG and TD3 are not (and I don’t know why)

Week 38 - September 21

Monday 9/20

Back to Aniti RL virtual school. Looking for material to be used to explain RL to my colleagues, and how to properly describe the experience I am running with gym.

Certainly will start lectures from deepming: 2021 DeepMind x UCL RL Lecture Series

Thursday 9/23

Start plotly course from datacamp using my datacamp learning process. I need basic interactivity and 3d plots to illustrate reward functions.