In DDP, pretrained model won't be loaded. #11769

comlhj1114 · 2024-05-08T14:25:49Z

Search before asking

I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Train, Multi-GPU

Bug

In DDP mode, pretrained model cannot be loaded.

There are three cases.

Case 1: CLI (NOT Working)
yolo detect train model=yolov8s.yaml pretrained=my_model.pt devices=0,1

Case 2: Python (NOT Working)
model = YOLO('yolov8s.yaml').load('my_model.pt')
model.train(...., device=[0,1])

Case 3: Python (ONLY Working)
model = YOLO('my_model.pt')
model.train(...., device=[0,1])

I think the generated python file for DDP makes problem, because it uses trainer = DetectionTrainer(cfg=cfg, overrides=overrides) but there is not a code for load pretrained model in DetectionTrainer.init.

This makes a problem in multi-GPU experiments, because there is no way to load a similar model, not exactly same model.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

glenn-jocher · 2024-05-08T18:37:07Z

@comlhj1114 hello! Thanks for reporting this issue with loading pretrained models in DDP mode. It looks like you're correct about the behavior you've observed.

For your Case 1 and Case 2 where the model isn't loading properly, it seems there might be a need for additional handling in our DDP setup to enable loading pretrained weights directly when initializing a model from a YAML file.

The common workaround for now, as you've found, is to load the model directly from the pretrained .pt file before calling .train(), as shown in your Case 3. That's a valid approach when deploying to a multi-GPU setting.

Meanwhile, I'll forward this issue to our development team to consider improving the handling of pretrained models in DDP configurations. If you have further insights or would like to contribute to a solution, we encourage you to follow and possibly contribute to this discussion on GitHub. Your feedback is invaluable! 😊

comlhj1114 · 2024-05-08T23:36:48Z

@glenn-jocher Thank you for your rapid and valuable feedback. I will try to contribute!

glenn-jocher · 2024-05-11T16:02:17Z

@comlhj1114 thanks for the PR, we have a review by @Laughing-q pending on it, and if everything looks good we should have it merged this week :)

comlhj1114 added the bug Something isn't working label May 8, 2024

comlhj1114 added a commit to comlhj1114/ultralytics that referenced this issue May 9, 2024

fix: issue ultralytics#11769 - case 1

4f8915f

comlhj1114 added a commit to comlhj1114/ultralytics that referenced this issue May 9, 2024

fix: issue ultralytics#11769 - case 1 and cas 2

35ccf70

comlhj1114 mentioned this issue May 9, 2024

ultralytics 8.2.16 DDP pretrained argument fix #11787

Merged

glenn-jocher linked a pull request May 11, 2024 that will close this issue

ultralytics 8.2.16 DDP pretrained argument fix #11787

Merged

glenn-jocher closed this as completed in #11787 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In DDP, pretrained model won't be loaded. #11769

In DDP, pretrained model won't be loaded. #11769

comlhj1114 commented May 8, 2024 •

edited

glenn-jocher commented May 8, 2024

comlhj1114 commented May 8, 2024

glenn-jocher commented May 11, 2024

In DDP, pretrained model won't be loaded. #11769

In DDP, pretrained model won't be loaded. #11769

Comments

comlhj1114 commented May 8, 2024 • edited

Search before asking

YOLOv8 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

glenn-jocher commented May 8, 2024

comlhj1114 commented May 8, 2024

glenn-jocher commented May 11, 2024

comlhj1114 commented May 8, 2024 •

edited