Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In DDP, pretrained model won't be loaded. #11769

Closed
1 of 2 tasks
comlhj1114 opened this issue May 8, 2024 · 3 comments · Fixed by #11787
Closed
1 of 2 tasks

In DDP, pretrained model won't be loaded. #11769

comlhj1114 opened this issue May 8, 2024 · 3 comments · Fixed by #11787
Labels
bug Something isn't working

Comments

@comlhj1114
Copy link
Contributor

comlhj1114 commented May 8, 2024

Search before asking

  • I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Train, Multi-GPU

Bug

In DDP mode, pretrained model cannot be loaded.

There are three cases.

Case 1: CLI (NOT Working)
yolo detect train model=yolov8s.yaml pretrained=my_model.pt devices=0,1

Case 2: Python (NOT Working)
model = YOLO('yolov8s.yaml').load('my_model.pt')
model.train(...., device=[0,1])

Case 3: Python (ONLY Working)
model = YOLO('my_model.pt')
model.train(...., device=[0,1])

I think the generated python file for DDP makes problem, because it uses trainer = DetectionTrainer(cfg=cfg, overrides=overrides) but there is not a code for load pretrained model in DetectionTrainer.init.

This makes a problem in multi-GPU experiments, because there is no way to load a similar model, not exactly same model.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@comlhj1114 comlhj1114 added the bug Something isn't working label May 8, 2024
@glenn-jocher
Copy link
Member

@comlhj1114 hello! Thanks for reporting this issue with loading pretrained models in DDP mode. It looks like you're correct about the behavior you've observed.

For your Case 1 and Case 2 where the model isn't loading properly, it seems there might be a need for additional handling in our DDP setup to enable loading pretrained weights directly when initializing a model from a YAML file.

The common workaround for now, as you've found, is to load the model directly from the pretrained .pt file before calling .train(), as shown in your Case 3. That's a valid approach when deploying to a multi-GPU setting.

Meanwhile, I'll forward this issue to our development team to consider improving the handling of pretrained models in DDP configurations. If you have further insights or would like to contribute to a solution, we encourage you to follow and possibly contribute to this discussion on GitHub. Your feedback is invaluable! 😊

@comlhj1114
Copy link
Contributor Author

@glenn-jocher Thank you for your rapid and valuable feedback. I will try to contribute!

comlhj1114 added a commit to comlhj1114/ultralytics that referenced this issue May 9, 2024
comlhj1114 added a commit to comlhj1114/ultralytics that referenced this issue May 9, 2024
@glenn-jocher
Copy link
Member

@comlhj1114 thanks for the PR, we have a review by @Laughing-q pending on it, and if everything looks good we should have it merged this week :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants