Anton Grabolle / Better Images of AI / AI Architecture / Licenced by CC-BY 4.0
BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics.
[ArXiv] [OpenReview] [NeurIPS] [Slides] [Poster]
The Building TimeSeries (BTS) dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardised in the formed of knowledge graph using the Brick schema. Scroll down to the Quick Start to get started.
This is the official repository of our NeurIPS 2024 DB Track paper that present this dataset.
Acknowledgement: This is part of the NSW, Australia Digital Infrastructure Energy Flexibility (DIEF) project.
Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics. Access to the dataset and the code used for benchmarking are available here: https://github.com/cruiseresearchgroup/DIEF_BTS
- Timeseries snippet: Download the timeseries snippet
DIEF_B_Snippet50_3weeks.pkl.zipand runDIEF_inspect_Snippet.ipynb. More info in the Snippet section below. - Building metadata knowledge graph in Brick schema: Download
Site_B.ttlandBrick_v1.2.1.ttland runDIEF_inspect_brick.ipynb. Learn more about the Brick schema here: https://brickschema.org/. More info in the Snippet section below. - Full Timeseries dataset: Download the raw dataset on FigShare and run
DIEF_inspect_raw.ipynb. More info in the Access section below.
The exact building is kept anonymous, but here are the general locations of the buildings:
- BTS A: Canberra, ACT, Australia
- BTS B: Newcastle, NSW, Australia
- BTS C: Melbourne, VIC, Australia
For ease, we provided a very small snippet of the dataset: DIEF_B_Snippet50_3weeks.pkl.zip.
Only 50 timeseries, all from Site B, haphazardly selected.
This file is less than 10 MB, small enough to be sent as email attachment in most system.
Accompanied with it is a short code to extract and visualize the dataset: DIEF_inspect_Snippet.ipynb.
We also provided the building metadata Site_B.ttl in the form of a Brick turtle file.
Accompanied with it is a Brick definition file Brick_v1.2.1.ttl and a short code to extract the statistics: DIEF_inspect_brick.ipynb.
If you have not, you will need to install the rdflib python package.
Access the raw dataset on FigShare; DOI: 10.6084/m9.figshare.28705559.
A train only sub-dataset partition was made available at an earlier time for the submission to peer-review. Access it on FigShare.
List of files available now:
Site_A_metadata.csv,Site_B_metadata.csv, andSite_C_metadata.csvare the timeseries metadata file containing basic statistics as well as the brick class. Treat theStreamIDcolumn as the primary key. Use a spreadsheet software or Pythonpandaslibrary to inspect the file.Site_Aaa.zip,Site_Baa.zip, andSite_Aaa.zipare the raw time series data. Each is a zip of a folder ofpicklefiles. Inside each pickle file is a list containing: a string ofStreamID, a 1D NumPy array of timestamps, and another 1D NumPy array of value. UseDIEF_inspect_raw.ipynbto inspect and see how the data is structured.Site_A.ttl,Site_B.ttl, andSite_C.ttlare the turtle files that contains the metadata of each buildings using the Brick schema. Use theStreamIDto match the nodes in this graph with the timeseries. You can useDIEF_inspect_brick.ipynbto inspect these files.
The following files are missing and it is intentional:
Site_Aaa/5765.pickleSite_Aaa/7558.pickleSite_Aaa/3973.pickleSite_Aaa/4445.pickle
Not all StreamID has a file.
The 20240530_class_code folder contains the code to reproduce the classification results.
Here is a short description of each file:
requirement.txtis coming, but all the information are already available ine12_pbs_py.sh.o116050675.xySplit.pyUse this to splittrain.ziptotrain_X.zipandtrain_Y.csv.e05_naieve_02.ipynbRun this notebook to get the naive results.e07_02_LR.ipynbRun this notebook to get the Logistic Regression results.e08_02_RF.ipynbRun this notebook to get the Random Forest results.e09_01_XGBoost.ipynbRun this notebook to get the XGBoost results.thuml_tslib_diefThis folder contains the modified library from Tsinghua University Machine Learning Group's library: https://github.com/thuml/Time-Series-Librarye12_pbs_py.shThis is the setup to run theTransformermodel. Consider this as the main function.e12_pbs_py.sh.o116050675This is the output of the above setup. It contains detailed information about installed Python packages, their version, as well as the hardware specifications.e13_pbs_py.shDLineare15_pbs_py.shPatchTSTe17_pbs_py.shInformer
The zeroshot_preprocess folder contains the code to preprocess the zero-shot forecasting data. Run zeroshot_preprocess/SiteA_preprocessing.ipynb to generate the processed BTS_A data.
- Note that, as of now, only the training data for the classification benchmark are made publicly available as we are planning to host a competition using this dataset. None of the data for the zero-shot forecasting benchmark are made available as of now. (Last update 2024 06 12)
The 20240612_zeroshot_code folder contains the code to reproduce the zero-shot forecasting results. This folder is a modified library from Tsinghua University Machine Learning Group's library: https://github.com/thuml/Time-Series-Library
- The zero-shot forecasting task shares the same
requirement.txtas the classification task. ./scripts/sample_DLinear.shis a sample script that trains a DLinear onBTS_Aand tests onBTS_BandBTS_C. To run the code, replace the argumentdata_pathwith your own data path- To train alternative models, replace the
model_namewith the target model name that is implemented in./modeldirectory and change the corresponding configurations
The e20_longTail folder contains the code to reproduce the figures in Appendix B.
This dataset is used as a part of the following competition: https://www.aicrowd.com/challenges/brick-by-brick-2024
A global challenge to automate building data classification, unlocking more intelligent, energy-efficient buildings for a sustainable future.
Buildings are one of the biggest energy consumers in the modern world, making energy efficiency essential. However, managing building systems data across different buildings is time-intensive and costly due to inconsistent data formats. This challenge invites you to transform building management by creating a solution that classifies building data automatically, promoting standardised, energy-efficient management for a more sustainable world.
The official archival repository for this competition is available here: /competition1
This competition has concluded. However, the sequel competition has just commenced. Check it out here: https://www.aicrowd.com/challenges/flextrack-challenge-2025 (last update, 2025 08 25).
Arian Prabowo, Xiachong Lin, Imran Razzak, Hao Xue, Emily W. Yap, Matthew Amos, Flora D. Salim. BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics, 2024. arXiv:2406.08990. https://doi.org/10.48550/arXiv.2406.08990.
BibTeX: The [NeurIPS 2024] paper
@inproceedings{prabowo2024bts,
title={BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics},
author={Arian Prabowo and Xiachong Lin and Imran Razzak and Hao Xue and Emily W. Yap and Matthew Amos and Flora D. Salim},
year={2024},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
url={https://openreview.net/forum?id=6cCFK69vJI}
}
The dataset
@article{prabowo2024btsTrain,
author = "Arian Prabowo and Xiachong Lin and Imran Razzak and Hao Xue and Emily Yap and Matthew Amos and Flora Salim",
title = "{BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics (TRAIN ONLY)}",
year = "2024",
month = "6",
url = "https://figshare.com/articles/dataset/BTS_Building_Timeseries_Dataset_Empowering_Large-Scale_Building_Analytics_TRAIN_ONLY_/25912180",
doi = "10.6084/m9.figshare.25912180.v1"
}
[WWW 2025]: Brick-by-Brick: Cyber-Physical Building Data Classification Challenge
@inproceedings{prabowo2025brick,
author = {Prabowo, Arian and Lin, Xiachong and Razzak, Imran and Xue, Hao and Amos, Matthew and White, Stephen D. and Salim, Flora D.},
title = {Brick-by-Brick: Cyber-Physical Building Data Classification Challenge},
year = {2025},
isbn = {9798400713316},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3701716.3718483},
doi = {10.1145/3701716.3718483},
booktitle = {Companion Proceedings of the ACM on Web Conference 2025},
pages = {3021–3025},
numpages = {5},
keywords = {building, classification, machine learning, ontology, timeseries},
location = {Sydney NSW, Australia},
series = {WWW '25}
}
[PerCom 2025]: A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings
@article{lin2025gap,
title={A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings},
author={Lin, Xiachong and Prabowo, Arian and Razzak, Imran and Xue, Hao and Amos, Matthew and Behrens, Sam and Salim, Flora D.},
journal={IEEE Pervasive Computing},
year={2025},
pages={1-13},
doi={10.1109/MPRV.2025.3542061}
}
[ICDMW 2024]: BiTSA: Leveraging Time Series Foundation Model for Building Energy Analytics
@inproceedings{lin2024bitsa,
title={BiTSA: Leveraging Time Series Foundation Model for Building Energy Analytics},
author = { Lin, Xiachong and Prabowo, Arian and Razzak, Imran and Xue, Hao and Amos, Matthew and Behrens, Sam and Salim, Flora D. },
booktitle = { 2024 IEEE International Conference on Data Mining Workshops (ICDMW) },
year={2024},
pages = {891-894},
url = {https://doi.ieeecomputersociety.org/10.1109/ICDMW65004.2024.00122},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = Dec
}
- Contact for this code repo: https://www.arianprabowo.com/
- Similar dataset: LBNL Building 59 A three-year building operational performance dataset for informing energy efficiency
- Similar dataset: Mortar (Access is currently restricted. Last update:2024 08 19)
- Other buildings related dataset: Building Data Genome Directory
- Learn more about building analytics and data-driven smart buildings from IEA EBC Annex 81.



