Computer vision losses

Loss functions commonly used in computer vision tasks. For general losses (Cross-Entropy, MSE, KL Divergence), see General losses. For CV evaluation metrics, see Computer vision metrics.

When to use which loss

Loss	When to use
Focal	Class imbalance in detection or segmentation.
Dice	Segmentation overlap — medical imaging, semantic/instance segmentation.
IoU / Jaccard	Bounding-box quality, detection.
Perceptual	Feature-level supervision for super-res, style transfer, image translation.
Adversarial	GAN training — generator vs discriminator.
SSIM	Image restoration, compression, super-res — structural similarity.

Focal Loss

Addresses class imbalance by down-weighting the contribution of easy examples.

L_{focal} = - α_{t} (1 - p_{t})^{γ} lo g (p_{t})

Where:

$p_{t}$ is the probability of the correct class.
$α_{t}$ is a balancing factor.
$γ$ is a focusing parameter.

Applications: Object detection (RetinaNet), segmentation with imbalanced classes, medical image analysis.

Dice Loss

Based on the Dice coefficient, which measures the overlap between predicted and ground truth segmentation.

L_{Dice} = 1 - \frac{2 \sum _{i}^{N} p _{i} g _{i}}{\sum _{i}^{N} p _{i}^{2} + \sum _{i}^{N} g _{i}^{2}}

Where:

$p_{i}$ is the predicted probability.
$g_{i}$ is the ground truth binary mask.

Applications: Medical image segmentation, semantic segmentation, instance segmentation.

Variants:

Tversky Loss — generalization of Dice loss that allows for tuning precision and recall.
Combo Loss — combination of Dice loss and weighted cross-entropy.

IoU (Intersection over Union) / Jaccard Loss

Based on the IoU metric; directly optimizes the quality of bounding box predictions.

L_{IoU} = 1 - \frac{area of overlap}{area of union}

Applications: Object detection, instance segmentation, bounding box regression.

Perceptual Loss

Compares high-level feature representations extracted by a pre-trained CNN instead of pixel-wise differences.

L_{perceptual} = j \sum λ_{j} \frac{1}{C _{j} H _{j} W _{j}} c, h, w \sum (Φ_{j} (I)_{c, h, w} - Φ_{j} (\hat{I})_{c, h, w})^{2}

Where:

$Φ_{j}$ is the feature map from the $j$ -th layer of a pre-trained network.
$I$ is the ground truth image.
$\hat{I}$ is the generated image.
$C_{j}, H_{j}, W_{j}$ are the dimensions of the feature map.

Applications: Super-resolution, style transfer, image-to-image translation, image generation.

Adversarial Loss

Comes from Generative Adversarial Networks (GANs) and involves a minimax game between a generator and discriminator.

L_{adv} = E_{x \sim p_{data} (x)} [lo g D (x)] + E_{z \sim p_{z} (z)} [lo g (1 - D (G (z)))]

Where:

$D$ is the discriminator.
$G$ is the generator.
$p_{data}$ is the real data distribution.
$p_{z}$ is the noise distribution.

Applications: Image generation, image-to-image translation, domain adaptation, text-to-image generation.

Variants:

WGAN Loss — uses Wasserstein distance to provide more stable gradients.
LSGAN Loss — uses least squares instead of log-likelihood for more stable training.
Hinge Loss — alternative formulation that has shown good results for image generation.

SSIM (Structural Similarity Index) Loss

Measures the structural similarity between images, focusing on structural information, luminance, and contrast.

L_{SSIM} = 1 - SSIM (x, y)

SSIM (x, y) = \frac{( 2 μ _{x} μ _{y} + C _{1} ) ( 2 σ _{x y} + C _{2} )}{( μ _{x}^{2} + μ _{y}^{2} + C _{1} ) ( σ _{x}^{2} + σ _{y}^{2} + C _{2} )}

Where:

$μ_{x}$ , $μ_{y}$ are the average pixel values.
$σ_{x}^{2}$ , $σ_{y}^{2}$ are the variances.
$σ_{x y}$ is the covariance.
$C_{1}$ , $C_{2}$ are constants to avoid division by zero.

Applications: Image restoration, super-resolution, image compression, image quality assessment.

DSWoK — Data Science Well of Knowledge

Explorer

Computer vision losses

When to use which loss

Focal Loss

Dice Loss

IoU (Intersection over Union) / Jaccard Loss

Perceptual Loss

Adversarial Loss

SSIM (Structural Similarity Index) Loss

Links

Graph View

Table of Contents

Backlinks