Self-supervised representation learning by attracting positives, repelling negatives
Contrastive learning (e.g. SimCLR, MoCo) learns representations without labels by pulling together positive pairs (augmented views of the same image) and pushing apart negative pairs (different images). The NT-Xent loss is L = −log exp(sim(z_i,z_j)/τ) / ∑_{k≠i} exp(sim(z_i,z_k)/τ), where sim is cosine similarity and τ is temperature. Low temperature sharpens the distribution and forces tighter clustering. The key insight from uniformity and alignment analysis (Wang & Isola 2020): good representations are aligned on positives AND uniformly distributed on the hypersphere. The left panel shows the embedding space evolving — watch same-color points cluster together while different classes repel.