Ray Cluster does not work across multiple docker containers #45252
Labels
bug
Something that is supposed to be working; but isn't
core
Issues that should be addressed in Ray Core
@external-author-action-required
Alternate tag for PRs where the author doesn't have labeling permission.
P2
Important issue, but not time-critical
What happened + What you expected to happen
Not using docker, my two computers communicate fine/correctly. Also, if I am using ray on one docker container and connecting to it via another computer without docker, it works fine. If both computers are interacting via docker instances, or the Docker container is not the head, it works for a time, but then the worker docker container stops connecting to head. I know this by using
ray status
. I have it more detailed below and how to reproduce.Versions / Dependencies
ray==2.20.0
Reproduction script
How to easily reproduce
This works (straight computer to computer):
This semi works
DOCKERFILE
What is happening in Docker that isn't on the "normal" computer? Is it putting the process to sleep? As a side note, when stopping the worker instances when connected to head, it usually stops 2 ray processes. Stopping the Docker ray after ray only sees one node, however, shows that it is only stopping one process.
Issue Severity
None
The text was updated successfully, but these errors were encountered: