Hello,
I am looking for advice on how to approach using a scene created in Godot within a reinforcement learning project. I made an attempt to do this which comes very close to working, but with one problem which causes poor performance. I am hoping that there might be a known workaround or that someone might know a better strategy. In particular, I am only using the physics simulation and control, but I am sure we can all imagine cases where game logic would need to be present as well.
Here is what I attempted:
For my case, I used the python RL frame work of OpenAI Gym. I think this would work similarly for any RL frame work, but it gives a place for clean separation between the RL framework and the environment in which the agent acts. Within the Gym environment there is a "step" method which gives the action which the agent preforms and returns the observation (the state of the system after the action is performed). The framework handles most of the rest of the complexity, so for the purposes of this discussion I will omit the description of the other parts RL infrastructure.
Inside the RL enironment's step method, I use a socket connection to an already running instance of my Godot environment. The action is sent to Godot and performed and the relevant state of the system is gathered and sent back from Godot to the RL environment. From the Godot side, this is relatively easy to implement and leaves all of the complexity in the RL framework, which seems appropriate.
The problem that I run which I have run into is that I have to wait for the physics step to occur before I can sample the environment (and detect collisions, etc) to send back to the RL framework. If there was manually control for the physics step, this would not be a problem. Without that manually control, this strategy is extremely inefficient. For a relatively simple model, I built the scene in pybullet and only used Godot for display, but for more complicated scenes, building the scene and control logic twice seems like it will be a lot of work.
I hope I described this well enough to make it understandable. If not, please let me know so that I can update it. Also, I am not set on using this strategy if a better one exist. Any/all advice on how to proceed is greatly appreciated.