You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+21-13Lines changed: 21 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ CompStats
27
27
28
28
Collaborative competitions have gained popularity in the scientific and technological fields. These competitions involve defining tasks, selecting evaluation scores, and devising result verification methods. In the standard scenario, participants receive a training set and are expected to provide a solution for a held-out dataset kept by organizers. An essential challenge for organizers arises when comparing algorithms' performance, assessing multiple participants, and ranking them. Statistical tools are often used for this purpose; however, traditional statistical methods often fail to capture decisive differences between systems' performance. CompStats implements an evaluation methodology for statistically analyzing competition results and competition. CompStats offers several advantages, including off-the-shell comparisons with correction mechanisms and the inclusion of confidence intervals.
29
29
30
-
To illustrate the use of `CompStats`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), three different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
30
+
To illustrate the use of `CompStats`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), four different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
31
31
32
32
>>> from sklearn.svm import LinearSVC
33
33
>>> from sklearn.naive_bayes import GaussianNB
@@ -51,10 +51,10 @@ Once the predictions are available, it is time to measure the algorithm's perfor
The previous code shows the macro-f1 score and, in parenthesis, its standard error. The actual performance value is stored in the `statistic` function.
54
+
The previous code shows the macro-f1 score andits standard error. The actual performance value is stored in the attributes `statistic` function, and `se`
55
55
56
-
>>> score.statistic
57
-
0.9434834454375508
56
+
>>> score.statistic, score.se
57
+
(0.9521479775366307, 0.009717884979482313)
58
58
59
59
Continuing with the example, let us assume that one wants to test another classifier on the same problem, in this case, a random forest, as can be seen in the following two lines. The second line predicts the validation set and sets it to the analysis.
60
60
@@ -63,28 +63,36 @@ Continuing with the example, let us assume that one wants to test another classi
63
63
<Perf(score_func=f1_score)>
64
64
Statistic with its standard error (se)
65
65
statistic (se)
66
-
0.9655 (0.0077) <= Random Forest
67
-
0.9435 (0.0099) <= alg-1
66
+
0.9720 (0.0076) <= Random Forest
67
+
0.9521 (0.0097) <= alg-1
68
68
69
-
Let us incorporate another prediction, now with the Naive Bayes classifier, as seen below.
69
+
Let us incorporate another predictions, now with Naive Bayes classifier, and Histogram Gradient Boosting as seen below.
70
70
71
71
>>> nb = GaussianNB().fit(X_train, y_train)
72
72
>>> score(nb.predict(X_val), name='Naive Bayes')
73
73
<Perf(score_func=f1_score)>
74
74
Statistic with its standard error (se)
75
75
statistic (se)
76
-
0.9655 (0.0077) <= Random Forest
77
-
0.9435 (0.0099) <= alg-1
78
-
0.8549 (0.0153) <= Naive Bayes
76
+
0.9759 (0.0068) <= Hist. Grad. Boost. Tree
77
+
0.9720 (0.0076) <= Random Forest
78
+
0.9521 (0.0097) <= alg-1
79
+
0.8266 (0.0159) <= Naive Bayes
79
80
80
-
The final step is to compare the performance of the three classifiers, which can be done with the `difference` method, as seen next.
81
+
The performance, its confidence interval (5%), and a statistical comparison (5%) between the best performing system with the rest of the algorithms is depicted in the following figure.
0 commit comments