Releases: leomaurodesenv/qasports-dataset-scripts
Release v2.0.0
A new collection of codes to elaborate the dataset named "QASports 2.0", the first large sports question answering dataset for open questions. QASports 2.0 contains real data of players, teams and matches from the top sports around the globe (soccer, football, basket, rugby, cricket, etc). It counts million questions and answers, cleaned and organized documents from Wikipedia-like sources. It contains:
- Crawler for Fandom wiki pages
- Fetching the list of useful links in a Fandom wiki
- Processing techniques to clean and transform the text
- Question-answering context extracting script
- Question-answering automatic dataset generation
- Data section algorithms for representative questions
Release v1.1.0
Collection of codes to elaborate the dataset named "QASports", the first large sports question answering dataset for open questions. QASports contains real data of players, teams and matches from the sports soccer, basketball and American football. It counts over 1.5 million questions and answers about 54k preprocessed, cleaned and organized documents from Wikipedia-like sources. It contains:
- Crawler for Fandom wiki pages
- Fetching the list of useful links in a Fandom wiki
- Processing techniques to clean and transform the text
- Question-answering context extracting script
- Question-answering automatic dataset generation
Paper: Pedro Calciolari Jardim, Leonardo Mauro Pereira Moraes, and Cristina Dutra Aguiar. QASports: A Question Answering Dataset about Sports. In Proceedings of the Brazilian Symposium on Databases: Dataset Showcase Workshop, pages 1-12, Belo Horizonte, Minas Gerais, Brazil, 2023.
@inproceedings{jardim:2023:qasports-dataset,
author={Pedro Calciolari Jardim and Leonardo Mauro Pereira Moraes and Cristina Dutra Aguiar},
title = {{QASports}: A Question Answering Dataset about Sports},
booktitle = {Proceedings of the Brazilian Symposium on Databases: Dataset Showcase Workshop},
address = {Belo Horizonte, MG, Brazil},
url = {https://github.com/leomaurodesenv/qasports-dataset-scripts},
publisher = {Brazilian Computer Society},
pages = {1-12},
year = {2023}
}