~enan/ros-rl

td3: fix handling of data when using gpu device
td3: fix indexing in select_action
Update requirements
Non-technical changes
fix(readme): activate venv before installing pkgs
Add remark on unwieldy 2d action space in readme
td3,ddpg: let algo choose action dimension in env
td3: don't save the last episode model
td3: add some trained models for quick testing

The saved model is that of the actor network. The naming follows this
convention:

    td3_<checkpoint>_<episode>_actor.pth

<checkpoint>==1 is for 1d actor and <checkpoint>==2 is for 2d actor. NOTE
that the numbers being the same as the action-space dimension is
unrelated. I.e., <checkpoint>==3 won't necessarily mean the action space
is 3d. <checkpoint> is just that, a checkpoint, where we have achieved a
significant amount of progress to warrant saving the models.

Use the model with the highest <episode>. The model is saved only when
the total episodic return is greater than the max of the episodic
returns of all previous episodes. Which means, the model with the
highest <episode> number yields the greatest total episodic return.
Update readme with remarks
Update pip requirements
td3: accommodate multi-dimensional action space
td3: use a simpler Gaussian action noise
td3: fix: no seed function in our environment
td3: save only actor model for deploying

Critic models aren't used when testing the agent. BUT if/when we have to
train it over multiple days or in multiple environments and have to
break the session up, we will have to save critic models too.
td3: remove redundant comments

No reason leaving comment for self-explaining code. Just clutters up the
file.
td3: plot scores/losses in rows instead of columns
td3: fix: our env doesn't have a close function
td3: plot avg score over total score of episode
td3: save model if agent does better than before

Save model if the agent accumulates higher score in current episode than
in any of the previous episodes.
Next