- This project employs spectral biclustering, an unsupervised machine learning approach, to investigate bloc judging in figure skating competitions.
- I first estimate a fixed-effects regression model and apply graph-based filtering to construct a skater-country–by–judge-country matrix that captures cross-country scoring tendencies among a subset of countries with sufficient judging interactions.
- Applying spectral biclustering to this matrix reveals clear bloc judging behavior.
The data used in this project is obtained by web-scraping the International Skating Union's official website. The data collection code and documentation are available in this repository.
Bloc judging in figure skating refers to the pattern in which judges from a group of countries (a "bloc") assign relatively higher scores to athletes from the same group and relatively lower scores to athletes from outside the group, beyond what can be explained by performance quality or other factors. Despite being discussed widely in the figure skating community, the empirical evidence on bloc judging remains limited due to the unobservability of judging blocs and constraints in the data structure. First, because bloc judging is prohibited, the bloc membership of each country is not publicly observable, making the quantification of bloc judging or any downstream analysis difficult. Second, the composition of skaters and judges differs across competitions, and judging outcomes are influenced by many factors—such as skaters’ ability, judges’ idiosyncratic preferences, and political climate—many of which are unobservable, making the identification of judging blocs statistically challenging.
To the best of my knowledge, there are currently only two papers that investigate bloc judging in figure skating empirically. [1] uses fixed-effects regression models to analyze out-group bias among judges from NATO and Warsaw Pact countries and finds that, during the Cold War era, judges assigned significantly lower scores to skaters from out-group countries at the Olympic Games. However, this approach presupposes the bloc membership of each country, an assumption whose validity cannot be verified. Furthermore, it is unclear whether contemporary judging blocs, if any, continue to reflect Cold War geopolitical alignments. [2] takes a more exploratory approach and uses a maximum likelihood model to identify judging blocs among 10 countries. However, a comprehensive understanding requires examining all major countries in the game, and a maximum likelihood model becomes very computationally expensive as the number of countries increases, with the computational cost growing on the order of
In this project, I advance this line of research by formulating the problem as a biclustering task—an unsupervised machine learning approach that simultaneously clusters both rows and columns of a matrix. Specifically, I construct a skater-country–by–judge-country matrix that captures cross-country scoring tendencies after controlling for other factors. A graph-based filtering is applied to only retain countries with sufficient cross-country judging interactions. I then identify judging blocs by applying spectral biclustering to this matrix, allowing both skater nations and judge nations to be grouped endogenously based on their observed interaction patterns. This approach does not make assumptions on countries that participate in bloc judging and the bloc membership of each country. Instead, the degree and composition of bloc judging are inferred from the clustering outcome. Computationally, it significantly outperforms the maximum likelihood model, especially when the number of countries is large (
The data used in this project comes from the International Skating Union's (ISU) official website. After each competition, ISU publicizes the detailed competition results, including judges' identity (e.g., names and nations) and scores given by each judge for each element.2 In a separate project (click here for the repository link), I developed a pipeline to web-scrape all the competition results from ISU championships, GP Series, JGP Series, and the Olympic Games from the 2004-2005 season to the 2024-2025 season, and organized them in structured tabular datasets.
In each competition, there are (typically) 9 judges. From my scraped datasets, I can trace each judge's scores given to each element for each skater. Figure 1 shows an illustrative example of information available from the published competition results. The cleaned dataset is at the competition-skater-judge-element level. The highlighted purple boxes in Figure 1 illustrate the definitions of a competition, a skater, a judge, and an element.3
Figure 1: Competition Results from GP Canada 2017
For the bloc judging investigation, I restrict my sample to competitions held between the 2016-2017 season and the 2021-2022 season because (1) before the 2016-2017 season, the judging is anonymous, so it is impossible to link the judge-level scores with judges' identity; (2) Russian skaters and judges, who played a central role in figure skating judging dynamics (as indicated by Figure 2 below), are banned from the 2022-2023 season onwards. This analysis focuses on the bloc judging under the normal competitive structure of the sport. How Russia's ban affects the bloc judging dynamics is a separate research question.
My final sample consists of 767 competitions, 12,501 performances (competition-skater combinations) and 1,476,241 competition-skater-judge-element combinations, covering skaters and judges from 73 and 53 countries, respectively. Figure 2 plots the number of performances skated by skaters from each country against the number of performances judges by judges from each country. We can see that countries with greater skater participation also exhibit greater judge presence in the ISU. During the sample period, the top five countries in terms of both skater and judge presence are Russia, the United States, Canada, Japan, and France (annotated in Figure 2).
Figure 2: Number of Performances Skated by Country vs. Number of Performances Judged by Country
Note: Each point represents a country. A performance is defined as a unique competition–skater combination; Each performance corresponds to one skater’s appearance in one specific competition. The number of performances judged is higher than the number of performances skated because there are (typically) 9 judges judging a single performance.
I construct a cross-country scoring tendency matrix that captures the extent of relative favoritism exhibited by judges from each country toward skaters from each country. The preprocessing procedure controls for major confounding factors, including skater ability, differences in overall scoring leniency and dispersion across competitions and elements, and country-specific judge leniency (e.g., judges from some countries consistently assign higher scores regardless of skater nationality).
Let the score given to skater
where
where
So, the residual is
It measures the judges' relative favoritism toward the skater plus some idiosyncratic noise. The (relative) scoring tendency of judges from country
From there, I construct a matrix of size
Not every cell has a value in the cross-country judging tendency matrix, as skaters from certain countries are never evaluated by judges from certain other countries. Judging blocs can only be identified among a subset of countries that have all evaluated one another with sufficient frequency.
I identify such a subset through graph theory. Formally, I create an undirected graph where the nodes represent countries, and I create an edge between country A and country B if 1) skaters from country A are judged by judges from country B for at least 110 times; and 2) skaters from country B are judged by judges from country A for at least 110 times.4 An edge between two countries indicates that there is sufficient mutual judging between the two countries.
Then, I search for the largest clique, the largest set of countries in which every pair is connected by an edge, among the graph. This gives me the largest subset of countries that have all evaluated one another with sufficient frequency. Such a subset contains 26 countries, covering 70% of competition-skater-judge-element combinations during my sample period. The judging blocs identification is conducted among the scoring tendency matrix that subsets to the 26 fully connected countries.
I use spectral biclustering, an unsupervised machine learning model, to identify judging blocs. This algorithm simultaneously groups rows and columns of a matrix and discovers coherent submatrices — blocks that have similar values inside. One advantage of this method is that it takes a fully exploratory approach: it does not make assumptions on countries that participate in bloc judging or the bloc membership of each country. Instead, the degree and composition of bloc judging are inferred from the clustering outcome.
Applying two clusters to both rows (skater's country) and columns (judge's country) on the
- Bloc 1 countries are Belarus, Estonia, Georgia, Israel, Kazakhstan, Latvia, Poland, Russia, Ukraine, with Russia being the leading country (defined by the skater and judge participation; see Figure 2)
- Bloc 2 countries are Australia, Austria, Canada, China, the Czech Republic, Spain, France, the United Kingdom, Germany, Italy, Japan, South Korea, Switzerland, Sweden, and the United States, with the United States being the leading country (defined by the skater and judge participation; see Figure 2)
The substantial overlap between the corresponding skater and judge clusters indicates that a large share of countries participate in bloc judging. In fact, only two out of the 26 countries do not belong to any judging bloc: Hungary and Turkey skaters receive high scores from judges in bloc 1, but judges from these two countries give high scores for skaters from bloc 2. In the subsequent analysis, I ignore these two countries and only investigate the 24 countries that belong to a judging bloc.
Figure 3: Biclustering Results High-Level Summary
Figure 4 shows the scoring tendency matrix after grouping skaters and judges from the same bloc together, excluding Hungary and Turkey. Red indicates scores that are higher than a judge’s baseline leniency, while blue indicates scores that are lower than a judge’s baseline leniency. The high values in the two on-diagonal submatrices and the low values in the two off-diagonal submatrices indicate clear bloc judging behavior. In addition, the higher values along the main diagonal suggest strong nationalistic bias: judges tend to assign substantially higher scores to skaters from their own country.
Figure 4: Biclustering Results
Figure 5 presents the results in a more aggregated manner. For each bloc, it reports judges’ average scoring tendencies toward three groups of skaters: skaters from the opposite bloc, skaters from the same bloc but a different country, and skaters from the judge’s home country. The vertical lines on each bar are the 95% confidential interval. On average, judges assign scores that are 0.10 standard deviations (S.D.) lower to skaters from the opposite bloc and 0.35 S.D. higher to skaters from their home country. Judges in bloc 1 exhibit a significant in-bloc bias of 0.10 S.D. even toward skaters from other countries within the same bloc. In contrast, judges in bloc 2 do not show significant favor toward in-bloc skaters from outside their home country. The fact that the confidence intervals for each bar do not overlap indicates that the differences between the groups are statistically significant. I verify that the results are not driven by the leading country (Russia and the United States) in each bloc (see robustness checks in bloc_judging_investigation.ipynb).
Figure 5: In-bloc, Out-bloc and In-country Bias
Finally, for those who are interested in knowing the cross-country scoring tendency at a more granular level, I conduct a focused analysis of the leading countries in each bloc—Russia in bloc 1 and the United States in bloc 2. Detailed results are provided in bloc_judging_investigation.ipynb.
This project reveals clear bloc judging behavior in figure skating competitions held between the 2016–2017 and 2021–2022 seasons. Two distinct judging blocs are identified, with Russia and the United States being the leading country in each bloc. As the top country in both skater participation and judge presence, Russia’s ban from the 2022–2023 season onward could potentially alter bloc judging dynamics in the sport. Future work could investigate how bloc judging evolves after Russia’s ban and examine how the ban affects skaters from both Russia’s own bloc and the opposing bloc.
[1] Sala, B.R., Scott, J.T. and Spriggs, J.F., 2007. The Cold War on ice: Constructivism and the politics of Olympic figure skating judging. Perspectives on Politics, 5(1), pp.17-29.
[2] Zitzewitz, E., 2006. Nationalism in winter sports judging and its lessons for organizational decision making. Journal of Economics & Management Strategy, 15(1), pp.67-99.
Footnotes
-
The computational complexity of $O(3^n)$ assumes a model in which the number of judging blocs is fixed at two and some countries do not belong to either bloc. ↩
-
Not all competition results disclose a judge’s country. When a judge’s country information is missing, I infer the judge’s country based on the country information reported for the same judge in other competitions. ↩
-
An element here refers to either a technical element, in which judges assign Grade of Execution (GOE) scores, or a program component element, in which judges assign component scores reflecting broader aspects of performance quality such as skating skills, transitions, performance, composition, and interpretation of the music. ↩
-
Here, one time means one competition-skater-judge-element observation. That is, one judge evaluating one element performed by one skater in one competition is counted as one time. I verify that the result is robust to alternative thresholds. ↩