-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In DDP, pretrained model won't be loaded. #11769
Comments
@comlhj1114 hello! Thanks for reporting this issue with loading pretrained models in DDP mode. It looks like you're correct about the behavior you've observed. For your Case 1 and Case 2 where the model isn't loading properly, it seems there might be a need for additional handling in our DDP setup to enable loading pretrained weights directly when initializing a model from a YAML file. The common workaround for now, as you've found, is to load the model directly from the pretrained Meanwhile, I'll forward this issue to our development team to consider improving the handling of pretrained models in DDP configurations. If you have further insights or would like to contribute to a solution, we encourage you to follow and possibly contribute to this discussion on GitHub. Your feedback is invaluable! 😊 |
@glenn-jocher Thank you for your rapid and valuable feedback. I will try to contribute! |
@comlhj1114 thanks for the PR, we have a review by @Laughing-q pending on it, and if everything looks good we should have it merged this week :) |
Search before asking
YOLOv8 Component
Train, Multi-GPU
Bug
In DDP mode, pretrained model cannot be loaded.
There are three cases.
Case 1: CLI (NOT Working)
yolo detect train model=yolov8s.yaml pretrained=my_model.pt devices=0,1
Case 2: Python (NOT Working)
model = YOLO('yolov8s.yaml').load('my_model.pt')
model.train(...., device=[0,1])
Case 3: Python (ONLY Working)
model = YOLO('my_model.pt')
model.train(...., device=[0,1])
I think the generated python file for DDP makes problem, because it uses
trainer = DetectionTrainer(cfg=cfg, overrides=overrides)
but there is not a code for load pretrained model in DetectionTrainer.init.This makes a problem in multi-GPU experiments, because there is no way to load a similar model, not exactly same model.
Environment
No response
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: