Hi Fatih!
A quick summary: It does sound like you are seeing overfitting. Try more
data, augmented data, Dropout
, pre-trained models, smaller models (if
they work for your use case), and SGD
with momentum.
This sounds reasonable. It should be straightforward to build a classifier
that works on your use case (for some machine-learning definition of
“straightforward.”)
This is not an unreasonably small amount of data, but it’s not a lot. In general,
the best way to address overfitting is to get more training data if it is practical
to do so. (Of course it isn’t always.) A factor of ten more data – say 5000 to
10000 samples with ground-truth labels – could prove very helpful.
Pytorch’s resnet, say resnet34, consistent with the original resnet paper, does
not include Dropout
, but you can add Dropout
layers. (There is some debate
about whether Dropout
works well with BatchNorm
layers and where they
should be located relative to one another, but Dropout
is, in general, a helpful
technique for reducing overfitting.)
Furthermore, if you’re not doing it already, you might try using a pre-trained
resnet model, and only fine tune the last few layers. Or you might first fine tune
the last few layers and then fine tune the whole model a little bit. The basic idea
is that the more (trainable) parameters a model has, the more likely it is to
overfit, so using the pre-trained parameter values as much as possible (which
know nothing about your data, so can’t already be overfitting it) could very well
help limit overfitting.
Because of the Sigmoid
, I assume that you are using BCELoss
. (If, not, that’s
a problem in and of itself.) This is unlikely to be your problem, but, for reasons
of numerical stability, you should get rid of the Sigmoid
and use as your loss
criterion BCEWithLogitsLoss
(which has logsigmoid()
built into it).
There is some credible lore that Adam
(as well as RMSprop
) can be more
subject to overfitting than non-adaptive optimizers. Try using SGD
(with
momentum).
Based on the images you posted, I assume that the white circular arcs in
your images are always in the same place. This would seem to limit the kinds
of data augmentation you could use – for example, RandomCrop
would cause
the arcs to appear in somewhat different locations in your image, which might
not work for you.
However, it does look to me like you could reflect the images about the diagonal
that runs from the upper left to the lower right. That would only give you a factor
of two augmentation, but every little bit helps.
At the cost of interpolating pixel values (because pixels line up only for rotations
of 90 degrees), you might try augmenting by randomly rotating your plate images
around the center of the plate. (Whether the pixel-interpolation would hurt more
than the augmentation helps, I don’t know.)
Best.
K. Frank