Reproduction of the TrAdaBoost experiments

The purpose of this example is to reproduce the results obtained in the paper Boosting for Transfer Learning (2007). In this work, the authors developed a transfer algorithm called TrAdaBoost dedicated for supervised domain adaptation. You can find more details about this algorithm here. The goal of this algorithm is to combine a source dataset with many labeled instances to a target dataset with few labels in order to learn a good model on the target domain.

We try to reproduce the two following exepriments:

  • Mushrooms

  • 20newsgroups

Mushrooms

Dataset description The Mushrooms data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom.

Experiment description: For the TrAdaBoost experiment, according to the authors :

The data is splited in two sets based on the feature stalk-shape. The diff-distribution data set (the source data set) consists of all the instances whose stalks are enlarging, while the same-distribution data set (the target data set) consists of the instances about tapering mushrooms. Then, the two sets contain examples from different types of mushrooms, which makes the distributions different. – Boosting for Transfer Learning (2007)

[53]:
from IPython.display import display
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
from adapt.datasets import open_uci_dataset
[54]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"
columns = ["target", "cap-shape","cap-surface","cap-color","bruises?","odor","gill-attachment","gill-spacing",
           "gill-size","gill-color","stalk-shape","stalk-root","stalk-surface-above-ring","stalk-surface-below-ring",
           "stalk-color-above-ring","stalk-color-below-ring","veil-type","veil-color","ring-number","ring-type",
           "spore-print-color","population","habitat"]
data = pd.read_csv(url, header=None)
data.columns = columns
X = data.drop(["target"], axis=1)
y = data[["target"]]
display(X.head())
cap-shape cap-surface cap-color bruises? odor gill-attachment gill-spacing gill-size gill-color stalk-shape ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 x s n t p f c n k e ... s w w p w o p k s u
1 x s y t a f c b k e ... s w w p w o p n n g
2 b s w t l f c b n e ... s w w p w o p n n m
3 x y w t p f c n n e ... s w w p w o p k s u
4 x s g f n f w b k t ... s w w p w o e n a g

5 rows × 22 columns

[55]:
X["stalk-shape"].value_counts()
[55]:
t    4608
e    3516
Name: stalk-shape, dtype: int64

Note: When looking at the number of instances in each category of the stalk-shape attribute, it seems that the authors inversed the source data set with the target one in the text above. Indeed, when looking at Table 1 in the paper, the number of source instances should be 4608 which corresponds to the tapering class and not the enlarging one.

For the first experiment, the number of traget instances is set to 1% of the length of the source data set:

Each same-distribution (understand “target”) data set is split into two sets: a same-distribution training set Ts and a test set S. Table 3 presents the experimental results of SVM, SVMt, AUX and TrAdaBoost(SVM) when the ratio between same-distribution and diff-distribution (understand “source”) training data is 0.01. The performance in error rate was the average of 10 repeats by random. The number of iterations (of TrAdaBoost) is set to 100. – Boosting for Transfer Learning (2007)

Here SVM refers to a linear SVM classifier fitted only with source data, SVMt with source and target labeled data whith uniform weight, AUX refers to the BalanceWeighting method and TrAdaBoost(SVM) to TrAdaBoost used with a linear SVM classifier as base-learner. We also add a comparison with AdaBoost to verify that TrAdaBoost is not advantaged by the averaging of predicition over multiple estimators.

[56]:
def split_source_target(X, y, ratio_of_target_labels=0.01):

    Xs = X.loc[X["stalk-shape"]=="t"]
    ys = y.loc[Xs.index]
    Xt = X.loc[X["stalk-shape"]=="e"]
    yt = y.loc[Xt.index]

    Xt_lab = Xt.sample(int(ratio_of_target_labels*len(Xs)))
    yt_lab = yt.loc[Xt_lab.index]

    Xt = Xt.drop(Xt_lab.index, axis=0)
    yt = yt.drop(yt_lab.index, axis=0)

    ohe = OneHotEncoder(sparse=False).fit(X)
    Xs = ohe.transform(Xs)
    Xt = ohe.transform(Xt)
    Xt_lab = ohe.transform(Xt_lab)

    return Xs, ys["target"], Xt, yt["target"], Xt_lab, yt_lab["target"]
[57]:
from adapt.base import BaseAdaptEstimator
from scipy.sparse import vstack, issparse

# We create here the AUX model which consist in a balanced weighting
# between instances from source and target domains.
class BalancedWeighting(BaseAdaptEstimator):

    def __init__(self, estimator=None, alpha=1., Xt=None, yt=None):
        super().__init__(estimator=estimator, alpha=alpha, Xt=Xt, yt=yt)

    def fit(self, Xs, ys, Xt=None, yt=None, **kwargs):
        Xt, yt = self._get_target_data(Xt, yt)
        if issparse(Xs):
            X = vstack((Xs, Xt))
        else:
            X = np.concatenate((Xs, Xt))
        y = np.concatenate((ys, yt))
        sample_weight = np.ones(X.shape[0])
        sample_weight[Xs.shape[0]:] *= (Xs.shape[0] / Xt.shape[0]) * self.alpha

        self.fit_estimator(X, y, sample_weight=sample_weight)

We repeat the experiment 10 times with different random seed, the trade-off parameter alpha for the Balanced Weighting technique (AUX) is set to 4 as did the authors:

Besides the baselines, we also compare TrAdaBoost with the method developed for learning with auxiliary data proposed by Wu and Dietterich (2004), which is denoted as AUX. The parameter Cp/Ca (as used in (Wu & Dietterich, 2004)) is set to 4 after tuning. – Boosting for Transfer Learning (2007)

Besides, we balanced the weights between positive and negative instances:

Furthermore, we also added some constraints to the basic learners to avoid the case of training weights being unbalanced. When training SVM, we always balance the overall training weights between positive and negative examples. – Boosting for Transfer Learning (2007)

[58]:
from adapt.instance_based import TrAdaBoost
from sklearn.svm import LinearSVC

names = ["SVM", "SVMt", "AUX", "TrAdaBoost"]

scores = {k: [] for k in names}

for state in range(10):

    np.random.seed(state)

    Xs, ys, Xt, yt, Xt_lab, yt_lab = split_source_target(X, y, ratio_of_target_labels=0.01)

    if state == 0:
        print("Xs shape: %s, Xt shape: %s"%(str(Xs.shape), str(Xt.shape)))

    models = [
        LinearSVC(class_weight="balanced"),
        LinearSVC(class_weight="balanced"),
        BalancedWeighting(LinearSVC(class_weight="balanced"), alpha=4., Xt=Xt_lab, yt=yt_lab),
        TrAdaBoost(LinearSVC(class_weight="balanced"), n_estimators=100, verbose=0, Xt=Xt_lab, yt=yt_lab)
    ]

    for model, name in zip(models, names):

        if name == "SVMt":
            model.fit(np.concatenate((Xs, Xt_lab)), np.concatenate((ys, yt_lab)))
        else:
            model.fit(Xs, ys)
        scores[name].append(1-model.score(Xt, yt))

    print("Round %i : %s"%(state, str({k: np.round(v[-1], 3) for k, v in scores.items()})))
Xs shape: (4608, 117), Xt shape: (3470, 117)
Round 0 : {'SVM': 0.262, 'SVMt': 0.069, 'AUX': 0.067, 'TrAdaBoost': 0.067}
Round 1 : {'SVM': 0.263, 'SVMt': 0.06, 'AUX': 0.062, 'TrAdaBoost': 0.061}
Round 2 : {'SVM': 0.262, 'SVMt': 0.045, 'AUX': 0.046, 'TrAdaBoost': 0.048}
Round 3 : {'SVM': 0.261, 'SVMt': 0.021, 'AUX': 0.017, 'TrAdaBoost': 0.028}
Round 4 : {'SVM': 0.262, 'SVMt': 0.049, 'AUX': 0.048, 'TrAdaBoost': 0.052}
Round 5 : {'SVM': 0.261, 'SVMt': 0.052, 'AUX': 0.052, 'TrAdaBoost': 0.052}
Round 6 : {'SVM': 0.261, 'SVMt': 0.08, 'AUX': 0.08, 'TrAdaBoost': 0.063}
Round 7 : {'SVM': 0.262, 'SVMt': 0.086, 'AUX': 0.086, 'TrAdaBoost': 0.082}
Round 8 : {'SVM': 0.263, 'SVMt': 0.048, 'AUX': 0.049, 'TrAdaBoost': 0.044}
Round 9 : {'SVM': 0.261, 'SVMt': 0.042, 'AUX': 0.042, 'TrAdaBoost': 0.031}

Results Summary

[59]:
error_mu = np.round(pd.DataFrame(pd.DataFrame(scores).mean(0), columns=["Error"]), 3).transpose().astype(str)
error_std = np.round(pd.DataFrame(pd.DataFrame(scores).std(0), columns=["Error"]), 3).transpose().astype(str)
display(error_mu + " (" + error_std + ")")
SVM SVMt AUX TrAdaBoost
Error 0.261 (0.001) 0.055 (0.019) 0.055 (0.02) 0.053 (0.016)

The results that we obtain differ a little from the ones obtained by the authors in Table 3. Here, the error for SVMt, AUX and TrAdaBoost is smaller but the error of SVM is higher. Moreover, the error of SVMt is much lower than the corresponding error computed by the authors.

20 NewsGroup

Dataset description The 20 NewsGroup data set comprises around 18000 newsgroups posts on 20 main topics, whith some topics divided in subcategories.

Experiment description For the TrAdaBoost experiment, according to the authors: >We define the tasks as top-category classification problems. When we split the data to generate diff-distribution (source) and same-distribution (target) sets, the data are split based on subcategories instead of based on random splitting. Then, the two data sets contain data in different subcategories. Their distributions also differ as a result. > – Boosting for Transfer Learning (2007)

The authors do not precise which categories have been selected whithin each domain. We try to impute them based on the number of instances in each domain given by the authors in Table 1.

[6]:
from sklearn.datasets import fetch_20newsgroups

# Set download_if_missing to True if not downloaded yet
data = fetch_20newsgroups(download_if_missing=False, subset="all")

source_rec = ['rec.autos', 'rec.motorcycles']
target_rec = ['rec.sport.baseball', 'rec.sport.hockey']
source_sci = ['sci.crypt', 'sci.electronics']
target_sci = ['sci.med', 'sci.space']
source_talk = ['talk.politics.guns', 'talk.politics.mideast']
target_talk = ['talk.politics.misc', 'talk.religion.misc']

The author do not precise which preprocessing is applied on the data, so we use the default preprocessing of scikit-learn: TfidfVectorizer

[17]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words="english",
                                 analyzer="word",
                                 min_df=5,
                                 max_df=0.1)

X = vectorizer.fit_transform(data.data)

def split_source_target(source_index, target_index, positive_index, ratio_of_target_labels=0.01):

    Xs = X[source_index]
    Xt = X[target_index]

    ys = np.isin(data.target[source_index], positive_index).astype(float)
    yt = np.isin(data.target[target_index], positive_index).astype(float)

    lab_index = np.random.choice(Xt.shape[0], int(0.01*Xs.shape[0]), replace=False)
    unlab_index = np.array(list(set(np.arange(Xt.shape[0]))-set(lab_index)))

    Xt_lab = Xt[lab_index]
    yt_lab = yt[lab_index]

    Xt = Xt[unlab_index]
    yt = yt[unlab_index]

    return Xs, ys, Xt, yt, Xt_lab, yt_lab

We conduct the three proposed experiments “rec vs talk”, “rec vs sci” and “sci vs talk”. We set the number of TrAdaBoost estimators to 10 instead of 100. We found that using 100 estimators give poor results for TrAdaBoost.

Rec vs talk

[38]:
source_rec = ['rec.autos', 'rec.motorcycles']
target_rec = ['rec.sport.baseball', 'rec.sport.hockey']
source_talk = ['talk.politics.guns', 'talk.politics.misc']
target_talk = ['talk.religion.misc', 'talk.politics.mideast']

source_index = np.isin(data.target, [data.target_names.index(s) for s in source_rec+source_talk])
target_index = np.isin(data.target, [data.target_names.index(s) for s in target_rec+target_talk])
positive_index = [data.target_names.index(s) for s in target_rec+source_rec]
[39]:
from adapt.instance_based import TrAdaBoost
from sklearn.svm import LinearSVC
from scipy.sparse import vstack

names = ["SVM", "SVMt", "AUX", "TrAdaBoost"]

scores = {k: [] for k in names}

for state in range(10):

    np.random.seed(state)

    Xs, ys, Xt, yt, Xt_lab, yt_lab = split_source_target(source_index,
                                                         target_index,
                                                         positive_index,
                                                         ratio_of_target_labels=0.01)
    if state == 0:
        print("Xs shape: %s, Xt shape: %s"%(str(Xs.shape), str(Xt.shape)))

    models = [
        LinearSVC(class_weight="balanced"),
        LinearSVC(class_weight="balanced"),
        BalancedWeighting(LinearSVC(class_weight="balanced"), alpha=4., Xt=Xt_lab, yt=yt_lab),
        TrAdaBoost(LinearSVC(class_weight="balanced"), n_estimators=10, verbose=0, Xt=Xt_lab, yt=yt_lab)
    ]

    for model, name in zip(models, names):

        if name == "SVMt":
            model.fit(vstack((Xs, Xt_lab)), np.concatenate((ys, yt_lab)))
        else:
            model.fit(Xs, ys)
        scores[name].append(1-model.score(Xt, yt))

    print("Round %i : %s"%(state, str({k: np.round(v[-1], 3) for k, v in scores.items()})))
Xs shape: (3671, 34814), Xt shape: (3525, 34814)
Round 0 : {'SVM': 0.206, 'SVMt': 0.112, 'AUX': 0.099, 'TrAdaBoost': 0.091}
Round 1 : {'SVM': 0.207, 'SVMt': 0.106, 'AUX': 0.085, 'TrAdaBoost': 0.076}
Round 2 : {'SVM': 0.206, 'SVMt': 0.107, 'AUX': 0.089, 'TrAdaBoost': 0.076}
Round 3 : {'SVM': 0.205, 'SVMt': 0.119, 'AUX': 0.1, 'TrAdaBoost': 0.084}
Round 4 : {'SVM': 0.205, 'SVMt': 0.092, 'AUX': 0.08, 'TrAdaBoost': 0.078}
Round 5 : {'SVM': 0.205, 'SVMt': 0.107, 'AUX': 0.089, 'TrAdaBoost': 0.081}
Round 6 : {'SVM': 0.205, 'SVMt': 0.106, 'AUX': 0.087, 'TrAdaBoost': 0.076}
Round 7 : {'SVM': 0.207, 'SVMt': 0.104, 'AUX': 0.089, 'TrAdaBoost': 0.081}
Round 8 : {'SVM': 0.207, 'SVMt': 0.104, 'AUX': 0.092, 'TrAdaBoost': 0.091}
Round 9 : {'SVM': 0.206, 'SVMt': 0.104, 'AUX': 0.088, 'TrAdaBoost': 0.073}

Results Summary

[40]:
error_mu = np.round(pd.DataFrame(pd.DataFrame(scores).mean(0), columns=["Error"]), 3).transpose().astype(str)
error_std = np.round(pd.DataFrame(pd.DataFrame(scores).std(0), columns=["Error"]), 3).transpose().astype(str)
display(error_mu + " (" + error_std + ")")
SVM SVMt AUX TrAdaBoost
Error 0.206 (0.001) 0.106 (0.007) 0.09 (0.006) 0.081 (0.006)

Rec vs Sci

[41]:
source_rec = ['rec.autos', 'rec.motorcycles']
target_rec = ['rec.sport.baseball', 'rec.sport.hockey']
source_sci = ['sci.crypt', 'sci.electronics']
target_sci = ['sci.med', 'sci.space']

source_index = np.isin(data.target, [data.target_names.index(s) for s in source_sci+source_rec])
target_index = np.isin(data.target, [data.target_names.index(s) for s in target_sci+target_rec])
positive_index = [data.target_names.index(s) for s in target_rec+source_rec]
[42]:
from adapt.instance_based import TrAdaBoost
from sklearn.svm import LinearSVC
from scipy.sparse import vstack

names = ["SVM", "SVMt", "AUX", "TrAdaBoost"]

scores = {k: [] for k in names}

for state in range(10):

    np.random.seed(state)

    Xs, ys, Xt, yt, Xt_lab, yt_lab = split_source_target(source_index,
                                                         target_index,
                                                         positive_index,
                                                         ratio_of_target_labels=0.01)
    if state == 0:
        print("Xs shape: %s, Xt shape: %s"%(str(Xs.shape), str(Xt.shape)))

    models = [
        LinearSVC(class_weight="balanced"),
        LinearSVC(class_weight="balanced"),
        BalancedWeighting(LinearSVC(class_weight="balanced"), alpha=4., Xt=Xt_lab, yt=yt_lab),
        TrAdaBoost(LinearSVC(class_weight="balanced"), n_estimators=10, verbose=0, Xt=Xt_lab, yt=yt_lab)
    ]

    for model, name in zip(models, names):

        if name == "SVMt":
            model.fit(vstack((Xs, Xt_lab)), np.concatenate((ys, yt_lab)))
        else:
            model.fit(Xs, ys)
        scores[name].append(1-model.score(Xt, yt))

    print("Round %i : %s"%(state, str({k: np.round(v[-1], 3) for k, v in scores.items()})))
Xs shape: (3961, 34814), Xt shape: (3931, 34814)
Round 0 : {'SVM': 0.347, 'SVMt': 0.194, 'AUX': 0.16, 'TrAdaBoost': 0.131}
Round 1 : {'SVM': 0.347, 'SVMt': 0.17, 'AUX': 0.14, 'TrAdaBoost': 0.116}
Round 2 : {'SVM': 0.349, 'SVMt': 0.208, 'AUX': 0.177, 'TrAdaBoost': 0.144}
Round 3 : {'SVM': 0.347, 'SVMt': 0.163, 'AUX': 0.139, 'TrAdaBoost': 0.119}
Round 4 : {'SVM': 0.346, 'SVMt': 0.165, 'AUX': 0.137, 'TrAdaBoost': 0.115}
Round 5 : {'SVM': 0.349, 'SVMt': 0.205, 'AUX': 0.163, 'TrAdaBoost': 0.138}
Round 6 : {'SVM': 0.347, 'SVMt': 0.166, 'AUX': 0.14, 'TrAdaBoost': 0.121}
Round 7 : {'SVM': 0.349, 'SVMt': 0.22, 'AUX': 0.182, 'TrAdaBoost': 0.15}
Round 8 : {'SVM': 0.35, 'SVMt': 0.185, 'AUX': 0.153, 'TrAdaBoost': 0.115}
Round 9 : {'SVM': 0.349, 'SVMt': 0.205, 'AUX': 0.169, 'TrAdaBoost': 0.139}

Results Summary

[43]:
error_mu = np.round(pd.DataFrame(pd.DataFrame(scores).mean(0), columns=["Error"]), 3).transpose().astype(str)
error_std = np.round(pd.DataFrame(pd.DataFrame(scores).std(0), columns=["Error"]), 3).transpose().astype(str)
display(error_mu + " (" + error_std + ")")
SVM SVMt AUX TrAdaBoost
Error 0.348 (0.001) 0.188 (0.021) 0.156 (0.017) 0.129 (0.013)

Talk vs Sci

[44]:
source_sci = ['sci.crypt', 'sci.electronics']
target_sci = ['sci.med', 'sci.space']
source_talk = ['talk.politics.misc', 'talk.religion.misc']
target_talk = ['talk.politics.guns', 'talk.politics.mideast']

source_index = np.isin(data.target, [data.target_names.index(s) for s in source_sci+source_talk])
target_index = np.isin(data.target, [data.target_names.index(s) for s in target_sci+target_talk])
positive_index = [data.target_names.index(s) for s in target_sci+source_sci]
[52]:
from adapt.instance_based import TrAdaBoost
from sklearn.svm import LinearSVC
from scipy.sparse import vstack

names = ["SVM", "SVMt", "AUX", "TrAdaBoost"]

scores = {k: [] for k in names}

for state in range(10):

    np.random.seed(state)

    Xs, ys, Xt, yt, Xt_lab, yt_lab = split_source_target(source_index,
                                                         target_index,
                                                         positive_index,
                                                         ratio_of_target_labels=0.01)
    if state == 0:
        print("Xs shape: %s, Xt shape: %s"%(str(Xs.shape), str(Xt.shape)))

    models = [
        LinearSVC(class_weight="balanced"),
        LinearSVC(class_weight="balanced"),
        BalancedWeighting(LinearSVC(class_weight="balanced"), alpha=4., Xt=Xt_lab, yt=yt_lab),
        TrAdaBoost(LinearSVC(class_weight="balanced"), n_estimators=10, verbose=0, Xt=Xt_lab, yt=yt_lab)
    ]

    for model, name in zip(models, names):

        if name == "SVMt":
            model.fit(vstack((Xs, Xt_lab)), np.concatenate((ys, yt_lab)))
        else:
            model.fit(Xs, ys)
        scores[name].append(1-model.score(Xt, yt))

    print("Round %i : %s"%(state, str({k: np.round(v[-1], 3) for k, v in scores.items()})))
Xs shape: (3378, 34814), Xt shape: (3794, 34814)
Round 0 : {'SVM': 0.261, 'SVMt': 0.209, 'AUX': 0.185, 'TrAdaBoost': 0.159}
Round 1 : {'SVM': 0.26, 'SVMt': 0.218, 'AUX': 0.202, 'TrAdaBoost': 0.184}
Round 2 : {'SVM': 0.26, 'SVMt': 0.203, 'AUX': 0.182, 'TrAdaBoost': 0.166}
Round 3 : {'SVM': 0.26, 'SVMt': 0.214, 'AUX': 0.199, 'TrAdaBoost': 0.177}
Round 4 : {'SVM': 0.26, 'SVMt': 0.197, 'AUX': 0.176, 'TrAdaBoost': 0.154}
Round 5 : {'SVM': 0.261, 'SVMt': 0.214, 'AUX': 0.199, 'TrAdaBoost': 0.18}
Round 6 : {'SVM': 0.26, 'SVMt': 0.202, 'AUX': 0.182, 'TrAdaBoost': 0.16}
Round 7 : {'SVM': 0.26, 'SVMt': 0.202, 'AUX': 0.179, 'TrAdaBoost': 0.158}
Round 8 : {'SVM': 0.259, 'SVMt': 0.186, 'AUX': 0.165, 'TrAdaBoost': 0.142}
Round 9 : {'SVM': 0.26, 'SVMt': 0.192, 'AUX': 0.167, 'TrAdaBoost': 0.145}

Results Summary

[46]:
error_mu = np.round(pd.DataFrame(pd.DataFrame(scores).mean(0), columns=["Error"]), 3).transpose().astype(str)
error_std = np.round(pd.DataFrame(pd.DataFrame(scores).std(0), columns=["Error"]), 3).transpose().astype(str)
display(error_mu + " (" + error_std + ")")
SVM SVMt AUX TrAdaBoost
Error 0.26 (0.0) 0.204 (0.01) 0.184 (0.013) 0.162 (0.014)

We can see that are not very similar to the ones that the authors obtained but we have the same hierarchical order of error level: SVM > SVMt > AUX > TrAdaBoost