3.4% accuracy jump on CUB200-2011 with a ResNet-18—that’s the headline improvement from OSCS-SupCon over CS-SupCon. The paper takes aim at two known weaknesses in supervised contrastive learning and fixes both with a surprisingly clean combination of a sigmoid-based loss and orthogonality constraints.
The Two Problems That Hold Back Standard SupCon
Standard SupCon uses the InfoNCE loss to pull together samples from the same class and push apart those from different classes. That works decently, but the loss dilutes negative signals when many negatives are easy to separate—the model stops learning fine-grained distinctions. Worse, the feature space remains entangled: category-relevant (“common”) and category-irrelevant (“style”) features overlap. That kills generalization when style shifts at test time.
OSCS-SupCon addresses both head-on. Instead of InfoNCE, it uses a sigmoid-based pairwise contrastive loss with two learnable scalars—temperature and bias—that adaptively modulate decision boundaries. The bias term lets the model ignore easy negatives and focus on hard ones. That directly counters the dilution problem.
Orthogonality Forces Features to Stay Separated
The second contribution is explicit orthogonality between common and style feature subspaces. The authors enforce this via a linear projection with ReLU nonlinearity, which minimizes cosine overlap. No implicit regularization, no adversarial training—just a hard geometric constraint that says the model must disentangle what matters for classification from what doesn't.
The combination is evaluated across six benchmark datasets and multiple backbones. The 3.4% gain on CUB200-2011 is the most striking, but consistent improvements appear elsewhere. Ablation studies confirm that both the sigmoid loss and the orthogonality loss contribute independently; removing either drops performance.
What This Enables Next
OSCS-SupCon doesn't require fancy architectures or extra data—it slots into any supervised contrastive pipeline with minimal overhead. Expect to see it become a drop-in replacement for InfoNCE in fine-grained classification and domain-generalization tasks where style variations are rampant. The next step is scaling this beyond ResNet-18 to vision transformers and multi-modal settings.
Source: OSCS-SupCon: Orthogonal Sigmoid-based Common and Style Supervised Contrastive Learning for Robust Feature Disentanglement
Domain: arxiv.org
Comments load interactively on the live page.