Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image augmentation: missing rotation #800

Open
ivanquirino opened this issue Jun 7, 2019 · 6 comments
Open

Image augmentation: missing rotation #800

ivanquirino opened this issue Jun 7, 2019 · 6 comments
Labels
enhancement New feature or request

Comments

@ivanquirino
Copy link

It's nice to have those nice image augmentations in both GluonCV and Gluon APIs, but I am missing rotation utilities, to randomly rotate images and bounding boxes. These would be a nice feature to have.

Another missing thing is that in Gluon we have the nice Compose API, to chain together various transforms in a nice way. It would be cool if the GluonCV transforms could be mixed in the compose API.

@zhreshold zhreshold added the enhancement New feature or request label Jun 7, 2019
@zhreshold
Copy link
Member

  1. The existing network and loss don't respect rotated bounding boxes, so the best bet is that we can adjust the bounding boxes to rotated objects. This way is doable, but expensive. But just as you have said, it's a nice feature to have.

  2. For image classification, we use Compose in GluonCV, but with object detection, it's getting more complicated, I will rethink about it.

@ivanquirino
Copy link
Author

@zhreshold thanks for answering.

  1. Adjusting the bouding boxes is exactly what I was thinking.

  2. I've built a custom transform class myself for my custom training script which is based on the training script on the VOC dataset example, I use Compose transforms on it, because I needed a YOLOv3 transform without random flipping or cropping:

class CustomTransform(object):
    def __init__(self, net=None, size=(SIZE, SIZE), mean=(0.485, 0.456, 0.406),
    std=(0.229, 0.224, 0.225), **kwargs):
        self._size = size
        self._mean = mean
        self._std = std
        
        self._img_transform = transforms.Compose([
            transforms.Resize(size, keep_ratio=True),
            transforms.RandomColorJitter(0.1, 0.1, 0.1, 0.1),
            transforms.RandomLighting(0.1),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std)
        ])
        
        self._target_generator = None
        
        if net is None:
            return
        
        # in case network has reset_ctx to gpu
        self._fake_x = nd.zeros((1, 3, size[0], size[0]))
        net = copy.deepcopy(net)
        net.collect_params().reset_ctx(None)
        
        with autograd.train_mode():
            _, self._anchors, self._offsets, self._feat_maps, _, _, _, _ = net(self._fake_x)
            
        self._target_generator = YOLOV3PrefetchTargetGenerator(
            num_class=len(net.classes), **kwargs)
        
    def __call__(self, src, label):
        h, w, _ = src.shape        
        
        img = self._img_transform(src)
        label = bbox.resize(label, (w,h), (self._size[0], self._size[1]))
        
        if  self._target_generator is None:
            return img, label.astype(img.dtype)
        
        gt_bboxes = nd.array(label[np.newaxis, :, :4])
        gt_ids = nd.array(label[np.newaxis, :, 4:5])
        gt_mixratio = None
        
        objectness, center_targets, scale_targets, weights, class_targets = self._target_generator(
            self._fake_x, self._feat_maps, self._anchors, self._offsets,
            gt_bboxes, gt_ids, gt_mixratio)
        
        return (img, objectness[0], center_targets[0], scale_targets[0], weights[0],
                class_targets[0], gt_bboxes[0])
  1. It would be nice if GluonCV provided official trainings script for custom datasets on the computer vision models: classification, detection and segmentation. There's a tutorial for preparing your custom data for object detection, but we end up having to find a script on the web of adapting our own from the scripts suited for the standard datasets.

@ivanquirino
Copy link
Author

I have a problem with my traning script on the validation stage, I get somethng like the following output for both VOCApMetric and VOC07ApMetric:

CLASS_1=NaN
CLASS_2=NaN
CLASS_3=NaN
CLASS_4=99%
CLASS_5=98%
CLASS_6=96%
mAP=97%

I think this problem may be

  1. my dataset is really small
  2. CLASS_3 and CLASS_6 are too similar and can be turned into a single class
  3. VOCApMetric classes are not suited for custom data and I should do my own metrics.

I would appreciate some input on this since I'm kinda new to ML in general, but I'm looking into this for an object detection application.

@zhreshold
Copy link
Member

The nan's looks suspicious to me, may be you don't have such classes in your validation set?

@ivanquirino
Copy link
Author

That's my validation LST file. It contains all classes.
pads.val.txt

What's strange is that after training the network can detect correctly every image in the validation set.

@ivanquirino
Copy link
Author

Correction: there is one image in the validation set that it wrongly classifies. I think the main problem is in my labeling, every label in classes 3 and 6 relate to an object which there's no left and right but those objects are being labeled as left and right. The other four classes have left and right labeled correctly. The correct way to label is to put classes 3 and 6 as the same class, have have 5 classes instead of 6.

Thanks for your input, @zhreshold , you made me see the error better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants