To find a research data repository for your data, you can search on the Registry of Research Data Repositories (re3data) platform and filter by country, content type, discipline, etc.
International:
- Zenodo: A general-purpose open access repository created by OpenAIRE and CERN. Integration with GitHub, allows researchers to upload files up to 50 GB.
- Figshare: Online digital repository where researchers can preserve and share their research outputs (figures, datasets, images and videos). Users can make all of their research outputs available in a citable, shareable and discoverable manner.
- EUDAT: European platform for researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment.
- Dryad: A general-purpose home for a wide diversity of datatypes, governed by a nonprofit membership organization. A curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable.
- The Open Science Framework: Gives free accounts for collaboration around files and other research artifacts. Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.
Sweden:
Norway:
- NIRD archive is widely used by some communities
- NSD - Norwegian Center for Research Data, for any kind of data
- Dataverse.no - Dataverse network, based at University of Tromsø but open for other institutions
- ELIXIR Norway for life science, sequence, omics data
Denmark:
Finland:
Portugal:
- The EU has a database directive which restricts data mining on databases.
- Has a somewhat similar effect to copyright, because copyright would not apply to data mining.
- A good license also gives rights to data mine. So not a major concern.
When you can use datasets:
- The license allows
- Your country has exceptions for research
- The data doesn't come from the EU
License text, slides, images, and supporting information under a Creative Commons license, and get a DOI using Zenodo or Figshare or OSF other services.
Is it data? Is it software? We need to consider the AI solution, the training data, the production data, the AI output, and AI evolutions.
How about ethics? How about liability?
- EU AI Act
- Models can be reverse-engineered and training data can be extracted
- What if the model generates an outcome that is dangerous? .cite[Thanks to E. Glerean for pointing these issues out to us]
Some resources
- RAIL initiative: "Responsible AI licenses"
- The Turing Way: Machine Learning Model Licenses
- "Expert Q&A on Artificial Intelligence (AI) Licensing"
- The Turing way
- Illustrations from the Turing Way book dashes
- Reproduciblity syllabus
- The reproducible research data analysis platform
- Good talks on open reproducible research can be found here.
- "FAIR is not fair enough"
- "A FAIRer future"
- "Top 10 FAIR Data & Software Things" are brief guides that can be used by the research community to understand how they can make their research (data and software) more FAIR.
- Five recommendations for fair software
- Publishing research software A MIT libraries webpage on why to publish software, where to publish software, and how to make software citable.
- Software Quality Checklist
- MolSSI Best Practice Guides
- Five recommendations for fair software
- Awesome Research Software Registries