Object Detection Metrics

Object detection models predict bounding boxes around objects and classify them.

  1. Intersection over Union (IoU): Measures the overlap between predicted and ground truth bounding boxes.
  1. Average Precision (AP): The area under the Precision-Recall curve for a specific class, calculated at a specific IoU threshold.

Where is the precision at recall level .

  1. Mean Average Precision (mAP): The mean of AP values across all object classes, often calculated at multiple IoU thresholds.

Semantic Segmentation Metrics

In semantic segmentation, each pixel belongs to one class.

  1. Pixel Accuracy: The proportion of correctly classified pixels among all pixels. Can be dominated by large classes (background).
  1. Mean Intersection over Union (mIoU): The average IoU across all classes.

Where:

  • is the number of true positive pixels for class
  • is the number of false positive pixels for class
  • is the number of false negative pixels for class
  • is the number of classes
  1. Frequency Weighted IoU (FWIoU): A weighted version of mIoU that accounts for class imbalance.

Where is the total number of pixels that truly belong to class .

  1. Dice Coefficient: (F1 Score equivalent for segmentation).

Instance Segmentation Metrics

Instance segmentation involves both semantic segmentation and instance differentiation (separating individual objects).

  1. Mask AP: Average Precision calculated based on IoU between predicted and ground truth masks instead of bounding boxes.

  2. Panoptic Quality (PQ): Combines recognition and segmentation quality for panoptic segmentation tasks.

Where:

  • is a predicted segment
  • is a ground truth segment
  • , , are true positives, false positives, and false negatives

Image Generation and Synthesis Metrics

These metrics evaluate the quality, diversity, and realism of generated images.

  1. Fréchet Inception Distance (FID): Measures the distance between the distribution of features from generated images and real images, extracted using a pre-trained Inception network. Compares mean and covariance of these feature distributions. Lower values indicate that generated images are more similar to real images in terms of deep features.

Where:

  • and are the mean feature representations of real and generated images
  • and are the covariance matrices of the feature representations
  1. Inception Score (IS): Measures the quality (sharpness, recognizability by a pre-trained Inception network) and diversity of generated images.

Where:

  • is the conditional class distribution for image
  • is the marginal class distribution
  1. Structural Similarity Index (SSIM): Measures the perceptual difference between two images based on luminance, contrast, and structure. Ranges from -1 to 1 (or 0 to 1). 1 indicates perfect similarity. Aims to be more consistent with human perception than PSNR/MSE.

Where:

  • and are the average pixel values
  • and are the variances
  • is the covariance
  • and are constants to avoid division by zero
  1. Peak Signal-to-Noise Ratio (PSNR): Measures the quality of reconstructed images in tasks like denoising or super-resolution. Ratio between the maximum possible power of a signal and the power of corrupting noise that affects its fidelity. Based on MSE.

Where:

  • is the maximum possible pixel value
  • is the mean squared error between images
  1. Learned Perceptual Image Patch Similarity (LPIPS): Measures perceptual similarity using deep features from pre-trained networks (VGG, AlexNet). Aims to align better with human perception of similarity than pixel-wise metrics like MSE.

3D Vision Metrics

Metrics for evaluating 3D reconstruction, depth estimation, and point cloud processing.

  1. Depth Estimation Metrics:

    • Mean Absolute Error (MAE): Average absolute difference between predicted and ground truth depths.
    • Root Mean Squared Error (RMSE): Square root of the average squared differences.
    • Threshold Accuracy: Percentage of pixels with ratio of predicted to ground truth depth within threshold (commonly ).
  2. Point Cloud Metrics:

    • Chamfer Distance: Measures the average distance from each point in one point cloud to its nearest neighbor in another point cloud.

    • Earth Mover’s Distance (EMD): The minimum “cost” to transform one point cloud into another.
    • F-Score: The harmonic mean of precision and recall at a specific distance threshold.
  3. 3D Reconstruction Metrics:

    • Volumetric IoU: The intersection over union of 3D volumes.
    • Surface-to-Surface Distance: The average distance between reconstructed and ground truth surfaces.

Human Pose Estimation Metrics

  1. Percentage of Correct Keypoints (PCK): The percentage of predicted keypoints that fall within a distance threshold of the ground truth keypoints.

  2. Mean Per Joint Position Error (MPJPE): The average Euclidean distance between predicted and ground truth joint positions.

  3. Object Keypoint Similarity (OKS): Similar to IoU but for keypoints, accounting for keypoint visibility and scale.

Where:

  • is the Euclidean distance between predicted and ground truth keypoint
  • is the object scale
  • is the per-keypoint constant
  • is the visibility flag for keypoint