Computer vision metrics evaluate models across detection, segmentation, generation, 3D, and pose estimation tasks. Most reduce to a precision/recall-style overlap measure, tailored to the prediction type (box, mask, keypoint, point cloud, image).

When to use which metric

MetricWhen to use
IoUBox or mask overlap with ground truth.
APArea under precision-recall for a class at a given IoU.
mAPMean AP across classes (and often across IoU thresholds).
Pixel AccuracyFraction of correctly classified pixels — dominated by large classes.
mIoUMean IoU across classes (standard semantic-segmentation metric).
FWIoUmIoU weighted by class frequency.
Dice CoefficientF1 equivalent for segmentation masks.
Mask APAP computed over masks instead of boxes.
Panoptic Quality (PQ)Panoptic segmentation — segmentation quality × recognition quality.
FIDGenerative image quality/diversity vs real distribution.
Inception Score (IS)Generative image sharpness and diversity.
SSIMPerceptual similarity via luminance/contrast/structure.
PSNRReconstruction quality (denoising, super-resolution).
LPIPSDeep-feature perceptual similarity.
Chamfer / EMDPoint-cloud distance.
PCK / MPJPE / OKSPose-keypoint accuracy.

Object Detection Metrics

Object detection models predict bounding boxes around objects and classify them.

Intersection over Union (IoU)

Measures the overlap between predicted and ground truth bounding boxes.

Average Precision (AP)

Area under the Precision-Recall curve for a specific class, calculated at a specific IoU threshold.

Where is the precision at recall level .

Mean Average Precision (mAP)

Mean of AP values across all object classes, often calculated at multiple IoU thresholds.

Semantic Segmentation Metrics

In semantic segmentation, each pixel belongs to one class.

Pixel Accuracy

Proportion of correctly classified pixels among all pixels. Can be dominated by large classes (e.g., background).

Mean Intersection over Union (mIoU)

Average IoU across all classes.

Where:

  • is the number of true positive pixels for class .
  • is the number of false positive pixels for class .
  • is the number of false negative pixels for class .
  • is the number of classes.

Frequency Weighted IoU (FWIoU)

Weighted version of mIoU that accounts for class imbalance.

Where is the total number of pixels that truly belong to class .

Dice Coefficient

F1 equivalent for segmentation masks.

Instance Segmentation Metrics

Instance segmentation involves both semantic segmentation and instance differentiation (separating individual objects).

Mask AP

Average Precision calculated based on IoU between predicted and ground truth masks instead of bounding boxes.

Panoptic Quality (PQ)

Combines recognition and segmentation quality for panoptic segmentation tasks.

Where:

  • is a predicted segment.
  • is a ground truth segment.
  • , , are true positives, false positives, and false negatives.

Image Generation and Synthesis Metrics

These metrics evaluate the quality, diversity, and realism of generated images.

Fréchet Inception Distance (FID)

Measures the distance between the distribution of features from generated images and real images, extracted using a pre-trained Inception network. Compares the mean and covariance of these feature distributions. Lower values indicate more realistic generated images.

Where:

  • and are the mean feature representations of real and generated images.
  • and are the covariance matrices of the feature representations.

Inception Score (IS)

Measures the quality (sharpness, recognizability by a pre-trained Inception network) and diversity of generated images.

Where:

  • is the conditional class distribution for image .
  • is the marginal class distribution.

Structural Similarity Index (SSIM)

Measures perceptual difference between two images based on luminance, contrast, and structure. Ranges from −1 to 1 (or 0 to 1); 1 = perfect similarity. More consistent with human perception than PSNR/MSE.

Where:

  • and are the average pixel values.
  • and are the variances.
  • is the covariance.
  • and are constants to avoid division by zero.

Peak Signal-to-Noise Ratio (PSNR)

Measures the quality of reconstructed images in tasks like denoising or super-resolution. Ratio between the maximum possible power of a signal and the power of corrupting noise that affects its fidelity. Based on MSE.

Where:

  • is the maximum possible pixel value.
  • is the mean squared error between images.

Learned Perceptual Image Patch Similarity (LPIPS)

Measures perceptual similarity using deep features from pre-trained networks (VGG, AlexNet). Aligns better with human perception than pixel-wise metrics like MSE.

3D Vision Metrics

Metrics for evaluating 3D reconstruction, depth estimation, and point cloud processing.

Depth Estimation

  • Mean Absolute Error (MAE) — average absolute difference between predicted and ground truth depths.
  • Root Mean Squared Error (RMSE) — square root of the average squared differences.
  • Threshold Accuracy — percentage of pixels with ratio of predicted to ground truth depth within threshold (commonly ).

Point Cloud

  • Chamfer Distance — average distance from each point in one cloud to its nearest neighbor in another.
  • Earth Mover’s Distance (EMD) — minimum “cost” to transform one point cloud into another.
  • F-Score — harmonic mean of precision and recall at a specific distance threshold.

3D Reconstruction

  • Volumetric IoU — intersection over union of 3D volumes.
  • Surface-to-Surface Distance — average distance between reconstructed and ground truth surfaces.

Human Pose Estimation Metrics

Percentage of Correct Keypoints (PCK)

Percentage of predicted keypoints that fall within a distance threshold of the ground truth keypoints.

Mean Per Joint Position Error (MPJPE)

Average Euclidean distance between predicted and ground truth joint positions.

Object Keypoint Similarity (OKS)

Similar to IoU but for keypoints — accounts for keypoint visibility and scale.

Where:

  • is the Euclidean distance between predicted and ground truth keypoint .
  • is the object scale.
  • is the per-keypoint constant.
  • is the visibility flag for keypoint .