logQ correction

LogQ correction is a bias correction technique used in recommendation systems to account for non-uniform sampling during training. When negative samples are not drawn uniformly at random (in case of popularity-based sampling, in-batch sampling), the model learns a biased representation that favors frequently sampled items.

LogQ correction adjusts the training objective by subtracting the log probability of sampling each item as a negative:

Original Loss (Contrastive): $L = - lo g \frac{e x p ( s _{u, i^{+}} )}{e x p ( s _{u, i^{+}} ) + \sum _{j \in negatives} e x p ( s _{u, j} )}$

Corrected Loss: $L_{corrected} = - lo g \frac{e x p ( s _{u, i^{+}} )}{e x p ( s _{u, i^{+}} ) + \sum _{j \in negatives} e x p ( s _{u, j} - l o g Q ( j ))}$

Where:

$s_{u, i}$ is the similarity score between user $u$ and item $i$
$Q (j)$ is the probability of sampling item $j$ as a negative

LogQ correction effectively “discounts” the similarity scores of frequently sampled items. If an item is sampled with high probability $Q (j)$ , then $lo g Q (j)$ is less negative, making $s_{u, j} - lo g Q (j)$ smaller and reducing the item’s influence in the loss.

For uniform sampling: $Q (j) = \frac{1}{N}$ for all items, so $lo g Q (j)$ is constant and cancels out.

For popularity-based sampling: $Q (j) \propto popularity (j)$ , so popular items get larger corrections.

Computing Q(j)

In-batch Sampling: If using other items in the batch as negatives: $Q (j) = \frac{frequency of item j in training data}{total interactions}$

Popularity-based Sampling: If sampling negatives proportional to popularity: $Q (j) = \frac{interaction_count ( j )}{\sum _{k} interaction_count ( k )}$

DSWoK — Data Science Well of Knowledge

Explorer

logQ correction

Computing Q(j)

Links

Graph View

Table of Contents

Backlinks