Object Detection Metrics
Object detection models predict bounding boxes around objects and classify them.
- Intersection over Union (IoU): Measures the overlap between predicted and ground truth bounding boxes.
- Average Precision (AP): The area under the Precision-Recall curve for a specific class, calculated at a specific IoU threshold.
Where is the precision at recall level .
- Mean Average Precision (mAP): The mean of AP values across all object classes, often calculated at multiple IoU thresholds.
Semantic Segmentation Metrics
In semantic segmentation, each pixel belongs to one class.
- Pixel Accuracy: The proportion of correctly classified pixels among all pixels. Can be dominated by large classes (background).
- Mean Intersection over Union (mIoU): The average IoU across all classes.
Where:
- is the number of true positive pixels for class
- is the number of false positive pixels for class
- is the number of false negative pixels for class
- is the number of classes
- Frequency Weighted IoU (FWIoU): A weighted version of mIoU that accounts for class imbalance.
Where is the total number of pixels that truly belong to class .
- Dice Coefficient: (F1 Score equivalent for segmentation).
Instance Segmentation Metrics
Instance segmentation involves both semantic segmentation and instance differentiation (separating individual objects).
-
Mask AP: Average Precision calculated based on IoU between predicted and ground truth masks instead of bounding boxes.
-
Panoptic Quality (PQ): Combines recognition and segmentation quality for panoptic segmentation tasks.
Where:
- is a predicted segment
- is a ground truth segment
- , , are true positives, false positives, and false negatives
Image Generation and Synthesis Metrics
These metrics evaluate the quality, diversity, and realism of generated images.
- Fréchet Inception Distance (FID): Measures the distance between the distribution of features from generated images and real images, extracted using a pre-trained Inception network. Compares mean and covariance of these feature distributions. Lower values indicate that generated images are more similar to real images in terms of deep features.
Where:
- and are the mean feature representations of real and generated images
- and are the covariance matrices of the feature representations
- Inception Score (IS): Measures the quality (sharpness, recognizability by a pre-trained Inception network) and diversity of generated images.
Where:
- is the conditional class distribution for image
- is the marginal class distribution
- Structural Similarity Index (SSIM): Measures the perceptual difference between two images based on luminance, contrast, and structure. Ranges from -1 to 1 (or 0 to 1).
1indicates perfect similarity. Aims to be more consistent with human perception than PSNR/MSE.
Where:
- and are the average pixel values
- and are the variances
- is the covariance
- and are constants to avoid division by zero
- Peak Signal-to-Noise Ratio (PSNR): Measures the quality of reconstructed images in tasks like denoising or super-resolution. Ratio between the maximum possible power of a signal and the power of corrupting noise that affects its fidelity. Based on MSE.
Where:
- is the maximum possible pixel value
- is the mean squared error between images
- Learned Perceptual Image Patch Similarity (LPIPS): Measures perceptual similarity using deep features from pre-trained networks (VGG, AlexNet). Aims to align better with human perception of similarity than pixel-wise metrics like MSE.
3D Vision Metrics
Metrics for evaluating 3D reconstruction, depth estimation, and point cloud processing.
-
Depth Estimation Metrics:
- Mean Absolute Error (MAE): Average absolute difference between predicted and ground truth depths.
- Root Mean Squared Error (RMSE): Square root of the average squared differences.
- Threshold Accuracy: Percentage of pixels with ratio of predicted to ground truth depth within threshold (commonly ).
-
Point Cloud Metrics:
- Chamfer Distance: Measures the average distance from each point in one point cloud to its nearest neighbor in another point cloud.
- Earth Mover’s Distance (EMD): The minimum “cost” to transform one point cloud into another.
- F-Score: The harmonic mean of precision and recall at a specific distance threshold.
-
3D Reconstruction Metrics:
- Volumetric IoU: The intersection over union of 3D volumes.
- Surface-to-Surface Distance: The average distance between reconstructed and ground truth surfaces.
Human Pose Estimation Metrics
-
Percentage of Correct Keypoints (PCK): The percentage of predicted keypoints that fall within a distance threshold of the ground truth keypoints.
-
Mean Per Joint Position Error (MPJPE): The average Euclidean distance between predicted and ground truth joint positions.
-
Object Keypoint Similarity (OKS): Similar to IoU but for keypoints, accounting for keypoint visibility and scale.
Where:
- is the Euclidean distance between predicted and ground truth keypoint
- is the object scale
- is the per-keypoint constant
- is the visibility flag for keypoint