Skip to content

Collaborating to improve population dynamics models through benchmark dataset validation

License

Notifications You must be signed in to change notification settings

lias-laboratory/yellowhammer-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

The Yellowhammer benchmark

PLEASE NOTE: this benchmark is currently under development. Please feel free to reload the data to get the latest version.

Abstract: We believe that system identification has strong potential to provide support for decision-making tools in the fight against biodiversity loss. To stimulate collaboration between the System Identification, Ecological Modelling, and Biostatistics communities, we propose to provide ready-to-use data with ‘long’ series and dynamics in time and space, and to mobilise a group of researchers from these communities with the aim of discussing our tools, methodologies, barriers and issues. As a foundation for this collaboration, we are making available a benchmark dataset focused on a bird species whose population is affected by agricultural practices and climate change, a real dataset provided by the national French Breeding Bird Survey (Jiguet et al., 2012). To understand and predict the impacts on the population dynamics of this bird, the Yellowhammer, scientists employ phenomenological models (e.g., GLM, GAM, Neural Networks or Random Forest) or deterministic models (difference equation, integro-differential equation, ODE, PDE). If you work with these applications, biostatisticians, ecologists or specialists in data-driven modelling, we invite you to test your models and methodologies on this dataset, and contribute to a better understanding of the influence of global change on wildlife populations. Based on discussions around this benchmark, the medium-term objectives are to understand the difficulties associated with ecological modelling, share tools and methodologies, compare models and algorithms on the same dataset, and provide new tools to decision-makers so that they can take biodiversity into account in public policy.

Jiguet, F., Devictor, V., Julliard, R., Couvet, D., October 2012. French citizens monitoring ordinary birds provide tools for conservation and ecological sciences. Acta Oecologica 44, 58–66.

Complete dataset

You have access to a CSV file, Yellowhammer_Clim_Bioclim_CLC_2002_2024.csv, containing 22,641 counting data entries with the year of observation, location, climatic, bioclimatic and habitat variables.

The complete dataset is used to produce the final estimated model.

70/30 test dataset

The 70/30 test dataset consists of 70% of the data for estimation and 30% for validation. You have access to two CSV files, Data_Estimation_70.csv and Data_Validation_30.csv, containing rows from 100 random 70/30 draws from the complete data table.

Prediction test dataset

The prediction test dataset splits the data into two parts, with the first 19 years for estimation and the last 5 years for validation. Actually, You have access to a CSV file, Data_Estimation_2002_2019.csv, containing data from the first 19 years for estimation. Validation performance is derived from the complete dataset, taking into account only the last five years.

Explicative variables

The climatic variables prec(x, y, t), tmax(x, y, t) and tmin(x, y, t), the bioclimatic variables BIO..(x, y, t) and the habitat variables CLC...(x, y, t) in two formats for easy use:

  • Matlab data files
  • Shape files

These files provide all the explicative variables on the map of France to produce the ecological niche modelling.

Link to download files: explicative_variables.zip

Some programs

The Yellowhammer benchmark is described in a paper submitted to Control Engineering Practice:

A. Alassani, R. Ouvrard, T. Poinot, O. Martin, A. Besnard, O. Gimenez, W. Thuiller, J. Garnier, F. Jiguet & B. Fontaine, 2026. Bridging System Identification, Ecological Modelling and Biostatistics communities: Collaborating to improve population dynamics models through benchmark dataset validation.

An application for estimating GLM, GAM and Random Forest models is provided for Baseline and preliminary result. The R routines used to estimate these models can be downloaded HERE.

Under development! You have access to some programs in Matlab, Python, and R to facilitate the use and processing of data sets.

License

Yellowhammer benchmark is released under the MIT License. See LICENSE for more information.

Contributors