Loss functions commonly used in computer vision tasks. For general losses (Cross-Entropy, MSE, KL Divergence), see General losses. For CV evaluation metrics, see Computer vision metrics.
When to use which loss
| Loss | When to use |
|---|---|
| Focal | Class imbalance in detection or segmentation. |
| Dice | Segmentation overlap — medical imaging, semantic/instance segmentation. |
| IoU / Jaccard | Bounding-box quality, detection. |
| Perceptual | Feature-level supervision for super-res, style transfer, image translation. |
| Adversarial | GAN training — generator vs discriminator. |
| SSIM | Image restoration, compression, super-res — structural similarity. |
Focal Loss
Addresses class imbalance by down-weighting the contribution of easy examples.
Where:
- is the probability of the correct class.
- is a balancing factor.
- is a focusing parameter.
Applications: Object detection (RetinaNet), segmentation with imbalanced classes, medical image analysis.
Dice Loss
Based on the Dice coefficient, which measures the overlap between predicted and ground truth segmentation.
Where:
- is the predicted probability.
- is the ground truth binary mask.
Applications: Medical image segmentation, semantic segmentation, instance segmentation.
Variants:
- Tversky Loss — generalization of Dice loss that allows for tuning precision and recall.
- Combo Loss — combination of Dice loss and weighted cross-entropy.
IoU (Intersection over Union) / Jaccard Loss
Based on the IoU metric; directly optimizes the quality of bounding box predictions.
Applications: Object detection, instance segmentation, bounding box regression.
Perceptual Loss
Compares high-level feature representations extracted by a pre-trained CNN instead of pixel-wise differences.
Where:
- is the feature map from the -th layer of a pre-trained network.
- is the ground truth image.
- is the generated image.
- are the dimensions of the feature map.
Applications: Super-resolution, style transfer, image-to-image translation, image generation.
Adversarial Loss
Comes from Generative Adversarial Networks (GANs) and involves a minimax game between a generator and discriminator.
Where:
- is the discriminator.
- is the generator.
- is the real data distribution.
- is the noise distribution.
Applications: Image generation, image-to-image translation, domain adaptation, text-to-image generation.
Variants:
- WGAN Loss — uses Wasserstein distance to provide more stable gradients.
- LSGAN Loss — uses least squares instead of log-likelihood for more stable training.
- Hinge Loss — alternative formulation that has shown good results for image generation.
SSIM (Structural Similarity Index) Loss
Measures the structural similarity between images, focusing on structural information, luminance, and contrast.
Where:
- , are the average pixel values.
- , are the variances.
- is the covariance.
- , are constants to avoid division by zero.
Applications: Image restoration, super-resolution, image compression, image quality assessment.