Loss functions commonly used in computer vision tasks. For general losses (Cross-Entropy, MSE, KL Divergence), see General losses. For CV evaluation metrics, see Computer vision metrics.

When to use which loss

LossWhen to use
FocalClass imbalance in detection or segmentation.
DiceSegmentation overlap — medical imaging, semantic/instance segmentation.
IoU / JaccardBounding-box quality, detection.
PerceptualFeature-level supervision for super-res, style transfer, image translation.
AdversarialGAN training — generator vs discriminator.
SSIMImage restoration, compression, super-res — structural similarity.

Focal Loss

Addresses class imbalance by down-weighting the contribution of easy examples.

Where:

  • is the probability of the correct class.
  • is a balancing factor.
  • is a focusing parameter.

Applications: Object detection (RetinaNet), segmentation with imbalanced classes, medical image analysis.

Dice Loss

Based on the Dice coefficient, which measures the overlap between predicted and ground truth segmentation.

Where:

  • is the predicted probability.
  • is the ground truth binary mask.

Applications: Medical image segmentation, semantic segmentation, instance segmentation.

Variants:

  • Tversky Loss — generalization of Dice loss that allows for tuning precision and recall.
  • Combo Loss — combination of Dice loss and weighted cross-entropy.

IoU (Intersection over Union) / Jaccard Loss

Based on the IoU metric; directly optimizes the quality of bounding box predictions.

Applications: Object detection, instance segmentation, bounding box regression.

Perceptual Loss

Compares high-level feature representations extracted by a pre-trained CNN instead of pixel-wise differences.

Where:

  • is the feature map from the -th layer of a pre-trained network.
  • is the ground truth image.
  • is the generated image.
  • are the dimensions of the feature map.

Applications: Super-resolution, style transfer, image-to-image translation, image generation.

Adversarial Loss

Comes from Generative Adversarial Networks (GANs) and involves a minimax game between a generator and discriminator.

Where:

  • is the discriminator.
  • is the generator.
  • is the real data distribution.
  • is the noise distribution.

Applications: Image generation, image-to-image translation, domain adaptation, text-to-image generation.

Variants:

  • WGAN Loss — uses Wasserstein distance to provide more stable gradients.
  • LSGAN Loss — uses least squares instead of log-likelihood for more stable training.
  • Hinge Loss — alternative formulation that has shown good results for image generation.

SSIM (Structural Similarity Index) Loss

Measures the structural similarity between images, focusing on structural information, luminance, and contrast.

Where:

  • , are the average pixel values.
  • , are the variances.
  • is the covariance.
  • , are constants to avoid division by zero.

Applications: Image restoration, super-resolution, image compression, image quality assessment.