Data augmentation options in Tensorflow object recognition

Recently I’ve been playing a bit with machine learning and Tensorflow and I struggled with the myriad of parameters of the library. So as a reminder to myself and for everybody’s convenience I’m posting some notes here.

I started using pretrained detection models like ssd mobilenet, inception ecc. Every one of these comes with a config sample. Training with your data you may feel the need to change some parameters. In my case for example I didn’t want to generate training data by flipping the image horizontally so I went to modify the data augmentation section of the config file.

This is the section for horizontal flip as found on the config file:

data_augmentation_options {
random_horizontal_flip {
}
}

In my data the color of the object is not that important, so I decided to change it to randomly convert the image to grayscale. To do so you have to change the config file as follows:

data_augmentation_options {
random_rgb_to_gray {
}
}

But the preprocessor has a default value for the probability of an image to be randomly converted. In the case of rgb_to_gray the default value is 0.1. It took me some time to figure out how to format the config file to insert the desired value without raising an error in the parser. This seems to work.

data_augmentation_options {
random_rgb_to_gray {
probability : 0.5
}
}

Finally, preprocessor.proto has a list of all the possible option and an explanation of what do they do.
I hope that this could help who, like me, sometimes feels lost when using Tensorflow.

edit 19/10/2017:

The tensorflow github repo structure changed so the link were broken. I updated them but there is no way to know when something will change again. In addition to search the files by yourself in the tensorflow repo if the link are broken, these are the steps accepted by the preprocessor at present and the default values.

NormalizeImage normalize_image = 1;
RandomHorizontalFlip random_horizontal_flip = 2;
RandomPixelValueScale random_pixel_value_scale = 3;
RandomImageScale random_image_scale = 4;
RandomRGBtoGray random_rgb_to_gray = 5;
RandomAdjustBrightness random_adjust_brightness = 6;
RandomAdjustContrast random_adjust_contrast = 7;
RandomAdjustHue random_adjust_hue = 8;
RandomAdjustSaturation random_adjust_saturation = 9;
RandomDistortColor random_distort_color = 10;
RandomJitterBoxes random_jitter_boxes = 11;
RandomCropImage random_crop_image = 12;
RandomPadImage random_pad_image = 13;
RandomCropPadImage random_crop_pad_image = 14;
RandomCropToAspectRatio random_crop_to_aspect_ratio = 15;
RandomBlackPatches random_black_patches = 16;
RandomResizeMethod random_resize_method = 17;
ScaleBoxesToPixelCoordinates scale_boxes_to_pixel_coordinates = 18;
ResizeImage resize_image = 19;
SubtractChannelMean subtract_channel_mean = 20;
SSDRandomCrop ssd_random_crop = 21;
SSDRandomCropPad ssd_random_crop_pad = 22;
SSDRandomCropFixedAspectRatio ssd_random_crop_fixed_aspect_ratio = 23;