Tuning the Classifier Alone Recovers 86% of Semi-Supervised Gains on Security Data

Classifier hyperparameter optimization alone recovers a median 86% of the gain from a full semi-supervised learning pipeline on tabular security data — and does it with a fraction of the complexity. That's the headline from SemiScope, a new analysis framework that systematically tears apart what actually drives performance in SSL-based security classification.

What SemiScope Actually Does

Most security teams treat semi-supervised learning as a black box: default parameters, fixed classifier, no handling of the class imbalance that pseudo-labels create. Recent work had been chasing gains by jointly optimizing the entire SSL pipeline — data augmentation, confidence thresholds, oversampling, the classifier — making it impossible to tell whether the improvement came from genuine SSL-classifier interaction or just from tuning the downstream classifier.

SemiScope solves that attribution problem with a clean control experiment. It uses Bayesian Optimization to jointly tune SSL settings, confidence filtering, oversampling, and the classifier. The key control, called Tuned-Clf, fixes SSL to default settings but gives the classifier the same 100-trial HPO budget and validation-set threshold tuning as the full pipeline.

The Numbers That Matter

At 10% labeled data, SemiScope beats every default SSL baseline across all five datasets, improving over the strongest baseline by 0.7–12.7 points in g-measure. But under the equal-budget control, Tuned-Clf is statistically equivalent to the full pipeline on 4 of 5 datasets (Phishing was inconclusive). That equivalence was tested with paired TOST using a ±1.0 g-measure smallest effect of interest — a rigorous equivalence test, not just a failure to reject.

Digging deeper: classifier HPO alone recovers a median 86% of SemiScope's gain over Default Self-Training (ST) with Random Forest (RF). The simpler recipe — Self-Training, Bayesian Optimization on the classifier, and tuning the decision threshold on validation data — reaches within 1 g-measure of fully supervised RF at 20–30% labels on four datasets and at 40% on Drebin. That's at the same or lower label rate than Default ST + RF on every dataset.

What This Means for Practitioners

If you're building security classifiers with limited labels, stop chasing elaborate SSL pipelines. Tune the classifier. Tune the threshold. That's it. The paper's reusable contribution is the decomposition protocol itself — a template for future work to stop conflating pipeline complexity with actual learning gains.

The next time someone claims their fancy SSL pipeline wins, ask them how much of that win is just a well-tuned random forest with a good decision threshold.

Source: SemiScope: Disentangling Classifier Tuning and Joint Optimization in Semi-Supervised Security Classification
Domain: arxiv.org

Tuning the Classifier Alone Recovers 86% of Semi-Supervised Gains on Security Data

What SemiScope Actually Does

The Numbers That Matter

What This Means for Practitioners

More in Machine Learning