You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi @dusty-nv , in trying to run train_ssd.py with the open images (python3 open_images_downloader.py --max-images=500 --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit)
this is the output i get, can you tell whats wrong with it? thanks in advance
python3 train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=1 --num-workers=1 --epochs=1
2024-04-15 10:05:38 - Using CUDA...
2024-04-15 10:05:38 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=1, checkpoint_folder='models/fruit', dataset_type='open_images', datasets=['data/fruit'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2024-04-15 10:06:45 - model resolution 300x300
2024-04-15 10:06:45 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2024-04-15 10:06:51 - Prepare training datasets.
2024-04-15 10:06:51 - loading annotations from: data/fruit/sub-train-annotations-bbox.csv
2024-04-15 10:06:52 - annotations loaded from: data/fruit/sub-train-annotations-bbox.csv
num images: 404
2024-04-15 10:06:54 - Dataset Summary:Number of Images: 404
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 261
Banana: 113
Grape: 136
Orange: 599
Pear: 191
Pineapple: 47
Strawberry: 550
Watermelon: 50
2024-04-15 10:06:54 - Stored labels into file models/fruit/labels.txt.
2024-04-15 10:06:54 - Train dataset size: 404
2024-04-15 10:06:54 - Prepare Validation datasets.
2024-04-15 10:06:54 - loading annotations from: data/fruit/sub-test-annotations-bbox.csv
2024-04-15 10:06:54 - annotations loaded from: data/fruit/sub-test-annotations-bbox.csv
num images: 73
2024-04-15 10:06:55 - Dataset Summary:Number of Images: 73
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 11
Banana: 9
Grape: 21
Orange: 62
Pear: 6
Pineapple: 10
Strawberry: 73
Watermelon: 11
2024-04-15 10:06:55 - Validation dataset size: 73
2024-04-15 10:06:55 - Build network.
2024-04-15 10:06:58 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
2024-04-15 10:07:01 - Took 2.97 seconds to load the model.
2024-04-15 10:07:02 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2024-04-15 10:07:02 - Uses CosineAnnealingLR scheduler.
2024-04-15 10:07:02 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_ssd.py", line 406, in
train(train_loader, net, criterion, optimizer, device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 149, in train
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The text was updated successfully, but these errors were encountered:
hi @dusty-nv , in trying to run
train_ssd.py
with the open images (python3 open_images_downloader.py --max-images=500 --class-names "Apple,Orange,Banana,Strawberry,Grape,Pear,Pineapple,Watermelon" --data=data/fruit
)this is the output i get, can you tell whats wrong with it? thanks in advance
python3 train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=1 --num-workers=1 --epochs=1
2024-04-15 10:05:38 - Using CUDA...
2024-04-15 10:05:38 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=1, checkpoint_folder='models/fruit', dataset_type='open_images', datasets=['data/fruit'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level='info', lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=1, num_workers=1, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resolution=300, resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2024-04-15 10:06:45 - model resolution 300x300
2024-04-15 10:06:45 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2024-04-15 10:06:45 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2024-04-15 10:06:51 - Prepare training datasets.
2024-04-15 10:06:51 - loading annotations from: data/fruit/sub-train-annotations-bbox.csv
2024-04-15 10:06:52 - annotations loaded from: data/fruit/sub-train-annotations-bbox.csv
num images: 404
2024-04-15 10:06:54 - Dataset Summary:Number of Images: 404
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 261
Banana: 113
Grape: 136
Orange: 599
Pear: 191
Pineapple: 47
Strawberry: 550
Watermelon: 50
2024-04-15 10:06:54 - Stored labels into file models/fruit/labels.txt.
2024-04-15 10:06:54 - Train dataset size: 404
2024-04-15 10:06:54 - Prepare Validation datasets.
2024-04-15 10:06:54 - loading annotations from: data/fruit/sub-test-annotations-bbox.csv
2024-04-15 10:06:54 - annotations loaded from: data/fruit/sub-test-annotations-bbox.csv
num images: 73
2024-04-15 10:06:55 - Dataset Summary:Number of Images: 73
Minimum Number of Images for a Class: -1
Label Distribution:
Apple: 11
Banana: 9
Grape: 21
Orange: 62
Pear: 6
Pineapple: 10
Strawberry: 73
Watermelon: 11
2024-04-15 10:06:55 - Validation dataset size: 73
2024-04-15 10:06:55 - Build network.
2024-04-15 10:06:58 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
2024-04-15 10:07:01 - Took 2.97 seconds to load the model.
2024-04-15 10:07:02 - Learning rate: 0.01, Base net learning rate: 0.001, Extra Layers learning rate: 0.01.
2024-04-15 10:07:02 - Uses CosineAnnealingLR scheduler.
2024-04-15 10:07:02 - Start training from epoch 0.
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_ssd.py", line 406, in
train(train_loader, net, criterion, optimizer, device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 149, in train
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The text was updated successfully, but these errors were encountered: