Do not follow this link

~enan/ros-rl

td3: fix handling of data when using gpu device
td3: fix indexing in select_action
Update requirements
Non-technical changes
fix(readme): activate venv before installing pkgs
Add remark on unwieldy 2d action space in readme
td3,ddpg: let algo choose action dimension in env
td3: don't save the last episode model
td3: add some trained models for quick testing

The saved model is that of the actor network. The naming follows this
convention:

    td3_<checkpoint>_<episode>_actor.pth

<checkpoint>==1 is for 1d actor and <checkpoint>==2 is for 2d actor. NOTE
that the numbers being the same as the action-space dimension is
unrelated. I.e., <checkpoint>==3 won't necessarily mean the action space
is 3d. <checkpoint> is just that, a checkpoint, where we have achieved a
significant amount of progress to warrant saving the models.

Use the model with the highest <episode>. The model is saved only when
the total episodic return is greater than the max of the episodic
returns of all previous episodes. Which means, the model with the
highest <episode> number yields the greatest total episodic return.
Update readme with remarks
Update pip requirements
td3: accommodate multi-dimensional action space
td3: use a simpler Gaussian action noise
td3: fix: no seed function in our environment
td3: save only actor model for deploying

Critic models aren't used when testing the agent. BUT if/when we have to
train it over multiple days or in multiple environments and have to
break the session up, we will have to save critic models too.
td3: remove redundant comments

No reason leaving comment for self-explaining code. Just clutters up the
file.
td3: plot scores/losses in rows instead of columns
td3: fix: our env doesn't have a close function
td3: plot avg score over total score of episode
td3: save model if agent does better than before

Save model if the agent accumulates higher score in current episode than
in any of the previous episodes.
Next
Do not follow this link