Skip to content

Investigating Bloc Judging in Figure Skating Competitions Using Biclustering

Notifications You must be signed in to change notification settings

mayupei/figure-skating-bloc-judging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bloc Judging in Figure Skating Competitions

Project Summary

  • This project employs spectral biclustering, an unsupervised machine learning approach, to investigate bloc judging in figure skating competitions.
  • I first estimate a fixed-effects regression model and apply graph-based filtering to construct a skater-country–by–judge-country matrix that captures cross-country scoring tendencies among a subset of countries with sufficient judging interactions.
  • Applying spectral biclustering to this matrix reveals clear bloc judging behavior.

The data used in this project is obtained by web-scraping the International Skating Union's official website. The data collection code and documentation are available in this repository.

Introduction

Bloc judging in figure skating refers to the pattern in which judges from a group of countries (a "bloc") assign relatively higher scores to athletes from the same group and relatively lower scores to athletes from outside the group, beyond what can be explained by performance quality or other factors. Despite being discussed widely in the figure skating community, the empirical evidence on bloc judging remains limited due to the unobservability of judging blocs and constraints in the data structure. First, because bloc judging is prohibited, the bloc membership of each country is not publicly observable, making the quantification of bloc judging or any downstream analysis difficult. Second, the composition of skaters and judges differs across competitions, and judging outcomes are influenced by many factors—such as skaters’ ability, judges’ idiosyncratic preferences, and political climate—many of which are unobservable, making the identification of judging blocs statistically challenging.

To the best of my knowledge, there are currently only two papers that investigate bloc judging in figure skating empirically. [1] uses fixed-effects regression models to analyze out-group bias among judges from NATO and Warsaw Pact countries and finds that, during the Cold War era, judges assigned significantly lower scores to skaters from out-group countries at the Olympic Games. However, this approach presupposes the bloc membership of each country, an assumption whose validity cannot be verified. Furthermore, it is unclear whether contemporary judging blocs, if any, continue to reflect Cold War geopolitical alignments. [2] takes a more exploratory approach and uses a maximum likelihood model to identify judging blocs among 10 countries. However, a comprehensive understanding requires examining all major countries in the game, and a maximum likelihood model becomes very computationally expensive as the number of countries increases, with the computational cost growing on the order of $O(3^n)$.1

In this project, I advance this line of research by formulating the problem as a biclustering task—an unsupervised machine learning approach that simultaneously clusters both rows and columns of a matrix. Specifically, I construct a skater-country–by–judge-country matrix that captures cross-country scoring tendencies after controlling for other factors. A graph-based filtering is applied to only retain countries with sufficient cross-country judging interactions. I then identify judging blocs by applying spectral biclustering to this matrix, allowing both skater nations and judge nations to be grouped endogenously based on their observed interaction patterns. This approach does not make assumptions on countries that participate in bloc judging and the bloc membership of each country. Instead, the degree and composition of bloc judging are inferred from the clustering outcome. Computationally, it significantly outperforms the maximum likelihood model, especially when the number of countries is large ($O(n^3)$ versus $O(3^n)$).

Data and Descriptive Statistics

The data used in this project comes from the International Skating Union's (ISU) official website. After each competition, ISU publicizes the detailed competition results, including judges' identity (e.g., names and nations) and scores given by each judge for each element.2 In a separate project (click here for the repository link), I developed a pipeline to web-scrape all the competition results from ISU championships, GP Series, JGP Series, and the Olympic Games from the 2004-2005 season to the 2024-2025 season, and organized them in structured tabular datasets.

In each competition, there are (typically) 9 judges. From my scraped datasets, I can trace each judge's scores given to each element for each skater. Figure 1 shows an illustrative example of information available from the published competition results. The cleaned dataset is at the competition-skater-judge-element level. The highlighted purple boxes in Figure 1 illustrate the definitions of a competition, a skater, a judge, and an element.3

Alt text
Figure 1: Competition Results from GP Canada 2017

For the bloc judging investigation, I restrict my sample to competitions held between the 2016-2017 season and the 2021-2022 season because (1) before the 2016-2017 season, the judging is anonymous, so it is impossible to link the judge-level scores with judges' identity; (2) Russian skaters and judges, who played a central role in figure skating judging dynamics (as indicated by Figure 2 below), are banned from the 2022-2023 season onwards. This analysis focuses on the bloc judging under the normal competitive structure of the sport. How Russia's ban affects the bloc judging dynamics is a separate research question.

My final sample consists of 767 competitions, 12,501 performances (competition-skater combinations) and 1,476,241 competition-skater-judge-element combinations, covering skaters and judges from 73 and 53 countries, respectively. Figure 2 plots the number of performances skated by skaters from each country against the number of performances judges by judges from each country. We can see that countries with greater skater participation also exhibit greater judge presence in the ISU. During the sample period, the top five countries in terms of both skater and judge presence are Russia, the United States, Canada, Japan, and France (annotated in Figure 2).

Alt text
Figure 2: Number of Performances Skated by Country vs. Number of Performances Judged by Country
Note: Each point represents a country. A performance is defined as a unique competition–skater combination; Each performance corresponds to one skater’s appearance in one specific competition. The number of performances judged is higher than the number of performances skated because there are (typically) 9 judges judging a single performance.

Construction of Cross-country Scoring Tendency Matrix

I construct a cross-country scoring tendency matrix that captures the extent of relative favoritism exhibited by judges from each country toward skaters from each country. The preprocessing procedure controls for major confounding factors, including skater ability, differences in overall scoring leniency and dispersion across competitions and elements, and country-specific judge leniency (e.g., judges from some countries consistently assign higher scores regardless of skater nationality).

Let the score given to skater $i$ by judge $j$ in competition $c$ for element $k$ be $score_{ijkp}$. First, I standardize scores within each competition–skater–element cell by computing z-scores, to remove variations in skater ability, the overall scoring leniency and scoring dispersion across competitions, skaters and elements. Formally,

$$zscore_{ijkc} = \frac{score_{ijkc} - \mu_{ikc}}{\sigma_{ikc}},$$

where $\mu_{ikc} = \frac{1}{|J_{ikc}|} \sum_{j \in J_{ikc}} score_{ijkc}$ and $\sigma_{ikc} = \sqrt{ \frac{1}{|J_{ikc}|} \sum_{j \in J_{ikc}} \left(score_{ijkc} - \mu_{ikc} \right)^2}$. $J_{ikc}$ denotes the set of judges that evaluates skater $i$ for element $k$ for competition $c$. Note that $J_{ikc}$ does not vary within a competition.

$zscore_{ijkc}$ reflects a judge's true leniency toward the skater compared with other judges in the same panel (among judges such that $j \in J_{ikc}$). It is possible that judges from certain countries are systematically more (less) lenient and have higher (lower) $zscore_{ijkc}$ regardless of the skater's country. In order to measure the relative favoritism, I remove the judge country-specific baseline leniency level by running the following fixed effects model and obtaining the residual

$$zscore_{ijkc} = \lambda_{country(j)} + \varepsilon_{ijkc},$$

where $\lambda_{country(j)}$ is the judge country fixed effects.

So, the residual is

$$\widehat{\varepsilon}_{ijkc} = zscore_{ijkc} - \widehat{\lambda}_{country(j)}.$$

It measures the judges' relative favoritism toward the skater plus some idiosyncratic noise. The (relative) scoring tendency of judges from country $b$ towards skaters from country $a$ is measured as the average residuals among scores judged by judges from country $b$ towards skaters from country $a$

$$tendency_{ab} = \frac{1}{\sum\mathbb{1}(country(i) = a, country(j) = b)} \sum_{country(i) = a, country(j) = b} \widehat{\varepsilon}_{ijkc}.$$

From there, I construct a matrix of size $73 \times 53$ that stores values of $tendency_{ab}$ for each skater nation and each judge nation.

Graph-Based Country Filtering

Not every cell has a value in the cross-country judging tendency matrix, as skaters from certain countries are never evaluated by judges from certain other countries. Judging blocs can only be identified among a subset of countries that have all evaluated one another with sufficient frequency.

I identify such a subset through graph theory. Formally, I create an undirected graph where the nodes represent countries, and I create an edge between country A and country B if 1) skaters from country A are judged by judges from country B for at least 110 times; and 2) skaters from country B are judged by judges from country A for at least 110 times.4 An edge between two countries indicates that there is sufficient mutual judging between the two countries.

Then, I search for the largest clique, the largest set of countries in which every pair is connected by an edge, among the graph. This gives me the largest subset of countries that have all evaluated one another with sufficient frequency. Such a subset contains 26 countries, covering 70% of competition-skater-judge-element combinations during my sample period. The judging blocs identification is conducted among the scoring tendency matrix that subsets to the 26 fully connected countries.

Judging Bloc Identification via Biclustering

I use spectral biclustering, an unsupervised machine learning model, to identify judging blocs. This algorithm simultaneously groups rows and columns of a matrix and discovers coherent submatrices — blocks that have similar values inside. One advantage of this method is that it takes a fully exploratory approach: it does not make assumptions on countries that participate in bloc judging or the bloc membership of each country. Instead, the degree and composition of bloc judging are inferred from the clustering outcome.

Applying two clusters to both rows (skater's country) and columns (judge's country) on the $26 \times 26$ cross-country scoring tendency matrix, the algorithm groups both skaters' and judges' countries into two clusters and results in 4 submatrices. Figure 3 presents a high-level summary of the biclustering results. It shows that skaters from skater cluster 1 receive relatively high scores from judges in judge cluster 1 and relatively low scores from judges in judge cluster 2. Conversely, skaters in skater cluster 2 receive higher scores from judges in judge cluster 2 and lower scores from judges in judge cluster 1. A judging bloc is defined as the set of countries that are clustered together both as skater nations and as judge nations. Accordingly, the intersection of skater cluster 1 and judge cluster 1 forms one judging bloc, and the intersection of skater cluster 2 and judge cluster 2 forms the other. This construction ensures that countries within each bloc exhibit consistent behavior in both how they score others and how they are scored by others.

  • Bloc 1 countries are Belarus, Estonia, Georgia, Israel, Kazakhstan, Latvia, Poland, Russia, Ukraine, with Russia being the leading country (defined by the skater and judge participation; see Figure 2)
  • Bloc 2 countries are Australia, Austria, Canada, China, the Czech Republic, Spain, France, the United Kingdom, Germany, Italy, Japan, South Korea, Switzerland, Sweden, and the United States, with the United States being the leading country (defined by the skater and judge participation; see Figure 2)

The substantial overlap between the corresponding skater and judge clusters indicates that a large share of countries participate in bloc judging. In fact, only two out of the 26 countries do not belong to any judging bloc: Hungary and Turkey skaters receive high scores from judges in bloc 1, but judges from these two countries give high scores for skaters from bloc 2. In the subsequent analysis, I ignore these two countries and only investigate the 24 countries that belong to a judging bloc.

Alt text
Figure 3: Biclustering Results High-Level Summary

Figure 4 shows the scoring tendency matrix after grouping skaters and judges from the same bloc together, excluding Hungary and Turkey. Red indicates scores that are higher than a judge’s baseline leniency, while blue indicates scores that are lower than a judge’s baseline leniency. The high values in the two on-diagonal submatrices and the low values in the two off-diagonal submatrices indicate clear bloc judging behavior. In addition, the higher values along the main diagonal suggest strong nationalistic bias: judges tend to assign substantially higher scores to skaters from their own country.

Alt text
Figure 4: Biclustering Results

Figure 5 presents the results in a more aggregated manner. For each bloc, it reports judges’ average scoring tendencies toward three groups of skaters: skaters from the opposite bloc, skaters from the same bloc but a different country, and skaters from the judge’s home country. The vertical lines on each bar are the 95% confidential interval. On average, judges assign scores that are 0.10 standard deviations (S.D.) lower to skaters from the opposite bloc and 0.35 S.D. higher to skaters from their home country. Judges in bloc 1 exhibit a significant in-bloc bias of 0.10 S.D. even toward skaters from other countries within the same bloc. In contrast, judges in bloc 2 do not show significant favor toward in-bloc skaters from outside their home country. The fact that the confidence intervals for each bar do not overlap indicates that the differences between the groups are statistically significant. I verify that the results are not driven by the leading country (Russia and the United States) in each bloc (see robustness checks in bloc_judging_investigation.ipynb).

Alt text
Figure 5: In-bloc, Out-bloc and In-country Bias

Finally, for those who are interested in knowing the cross-country scoring tendency at a more granular level, I conduct a focused analysis of the leading countries in each bloc—Russia in bloc 1 and the United States in bloc 2. Detailed results are provided in bloc_judging_investigation.ipynb.

Future Work

This project reveals clear bloc judging behavior in figure skating competitions held between the 2016–2017 and 2021–2022 seasons. Two distinct judging blocs are identified, with Russia and the United States being the leading country in each bloc. As the top country in both skater participation and judge presence, Russia’s ban from the 2022–2023 season onward could potentially alter bloc judging dynamics in the sport. Future work could investigate how bloc judging evolves after Russia’s ban and examine how the ban affects skaters from both Russia’s own bloc and the opposing bloc.

References

[1] Sala, B.R., Scott, J.T. and Spriggs, J.F., 2007. The Cold War on ice: Constructivism and the politics of Olympic figure skating judging. Perspectives on Politics, 5(1), pp.17-29.

[2] Zitzewitz, E., 2006. Nationalism in winter sports judging and its lessons for organizational decision making. Journal of Economics & Management Strategy, 15(1), pp.67-99.

Footnotes

  1. The computational complexity of $O(3^n)$ assumes a model in which the number of judging blocs is fixed at two and some countries do not belong to either bloc.

  2. Not all competition results disclose a judge’s country. When a judge’s country information is missing, I infer the judge’s country based on the country information reported for the same judge in other competitions.

  3. An element here refers to either a technical element, in which judges assign Grade of Execution (GOE) scores, or a program component element, in which judges assign component scores reflecting broader aspects of performance quality such as skating skills, transitions, performance, composition, and interpretation of the music.

  4. Here, one time means one competition-skater-judge-element observation. That is, one judge evaluating one element performed by one skater in one competition is counted as one time. I verify that the result is robust to alternative thresholds.

About

Investigating Bloc Judging in Figure Skating Competitions Using Biclustering

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published