Base SSL Algorithms¶
In addtion to fully-supervised method (as a baseline), USB supports the following popular 16 algorithms:
PiModel: is a simple SSL algorithm that forces the output probability of perturbed versions of unlabeled data be the same which uses Mean Squared Error (MSE) for optimization.
Pseudo-Label: turns the output probability of unlabeled data into the ‘one-hot’ hard one and makes the same unlabeled data to learn the pseudo ‘one-hot’ label. Pseudo Labeling uses CE for optimization.
MeanTeacher: takes the exponential moving average (EMA) of the neural model as the teacher model. With Mean Teacher, the neural model forces itself to output a similar probability to the EMA teacher. Though the later SSL algorithms will not always choose the EMA model as the teacher, they often use the EMA model for validation/test cause it decreases the risk of neural models falling into the local optima.
VAT: enhances the robustness of the conditional predicted label distribution around each unlabeled data against an adversarial perturbation. In other words, VAT forces the neural model to give similar predictions on unlabeled data even facing a strong adversarial perturbation.
MixMatch: first introduces Mixup into SSL by taking the input as the mixture of labeled and unlabeled data and the output as the mixture of labels and model predictions on unlabeled data. Note that MixMatch also utilizes MSE as the unsupervised loss.
ReMixMatch: can be seen as the upgraded version of MixMatch. ReMixMatch improves MixMatch by (1) proposing stronger augmentation (i.e., Control Theory Augmentation (CTAugment) for unlabeled data; (2) using Augmentation Anchoring to force the model to output similar predictions to weakly augmented unlabeled data when fed strongly augmented data; (3) utilizing Distribution Alignment to encourage the marginal distribution of predictions on unlabeled data to be similar to the marginal distribution of labeled data.
UDA: also introduces strong augmentation (i.e., RandAugment) for unlabeled data. The core idea of UDA is similar to Augmentation Anchoring, which forces the predictions of neural models on the strongly-augmented unlabeled data to be close to those of weakly-augmented unlabeled data. Instead of turning predictions into hard ‘one-hot’ pseudo-labels, UDA sharpens the prediction on unlabeled data. Thresholding technique is used to mask out unconfident unlabeled samples that is considered noise here.
FixMatch: is the upgraded version of Pseudo Labeling. FixMatch turns the predictions on weakly-augmented unlabeled data into hard ‘one-hot’ pseudo-labels and then further uses them as the learning signal of strongly-augmented unlabeled data. FixMatch finds that using a high threshold (e.g., 0.95) to filter noisy unlabeled predictions and take the rest as the pseudo-label can achieve very good performance.
Dash: improves the FixMatch by using a gradually increased threshold instead of a fixed threshold, which allows more unlabeled data to participate in the training at the early stage. Moreover, Dash theoretically establishes the convergence rate from the view of non-convex optimization.
CoMatch: firstly introduces contrastive learning into SSL. Except for consistency regularizing on the class probabilities, it is also exploited on graph-based feature representations, which imposes smooth constraints on pseudo-labels generated.
CRMatch: proposed an improved consistency regularization framework which impose consistency and equivariance on the classification probability and the feature level.
FlexMatch: firstly introduces the class-specific thresholds into SSL by considering the different learning difficulties of different classes. Specifically, the hard-to-learn classes should have a low threshold to speed up convergence while the easy-to-learn classes should have a high threshold to avoid confirmation bias.
AdaMatch: is proposed mainly for domain adaption, but can also adapted to SSL. It is characterized by Relative Threshold and Distribution Alignment, where the relative threshold is adaptively estimated from EMA of the confidence on labeled data.
SimMatch: extends CoMatch by considering semantic-level and instance-level consistency regularization. Similar similarity relationship of different augmented versions on the same data with respect to other instances is encouraged during training. In addition, a memory buffer consisting of predictions on labeled data is adopted to connect the two-level regularization.
[DeFixMatch] (https://arxiv.org/abs/2203.07512)