How much data do I need? Data augmentation tips for training a custom YOLOv5 model
Hey folks!
I’m working on a project using YOLOv5 to detect various symbols in images (see example below). Since labeling is pretty time-consuming, I’m planning to use the albumentations library to augment my manually labeled dataset with different transforms to help the model generalize better, especially with orientation issues.
My main goals:
- Increase dataset size
- Balance the different classes
A bit more context: Each image can contain multiple classes and several tagged symbols. With that in mind, I’d love to hear your thoughts on how to determine the right number of annotations per class to achieve a balanced dataset. For example, should I aim for 1.5 times the amount of the largest class, or is there a better approach?
Also, I’ve read that including negative samples is important and that they should make up about 50% of the data. What do you all think about this strategy?
Thanks!!