Skip to content

Releases: vincentnam/docker_datalake

V0.1.0

08 Oct 14:28

Choose a tag to compare

V0.1.0 Pre-release
Pre-release

This first version is the base version of automatic deployment containing these features :

  • MongoDB container
    • MongoDB collections creations
  • Openstack Swift container (container exists, need to fit it to the install)
  • Apache Airflow container
  • Processed Data zone container (Apache Airflow and Apache Spark)
    • Apache Airflow (from Apache docker hub)
  • Ansible Playbooks
    • Base playbooks
      • Docker install
      • Python install
    • Raw data zone playbooks
    • Metadata management zone playbooks
    • Process zone playbooks
  • Ansible playbook launcher (cf global/tasks/install_datalake)
  • Airflow python script updated to Airflow 2.0 major release

Some tasks will have to be done :

  • Import on Kubernetes cluster
  • Apache Spark container
  • Service zone container (Web GUI and REST api (Flask))
  • Security, authentication and monitoring container (Openstack Keystone, Kerberos (?), ... ?)
  • Ansible Playbooks
    • Processed data zone playbooks
    • Service zone playbooks
    • Security, authentication and monitoring zone playbooks

At this point, only the 3 main zone have been implemented. It is not a big problem because :

  • Process_data_zone can be any database already in production
  • Service zone is only a user overlay to make it easier to use for all user. But the architecture still can be used without
  • Security, authentication and monitoring zone is still not developped

V0.0.5 : First modis_master merge

21 Jul 15:47

Choose a tag to compare

Pre-release

V0.0.5 : First modis_master merging. First projects are on Process Zone, Service Zone and Stream data.

Process zone

  • First Apache Spark integration
  • Historisation des traitements appliqués à chaque donnée
  • Traitement des données (Workflow Airflow - données de type Time Series format CSV (Ex. : relevés Météo, relevés de capteurs, ...etc.)
  • Traitement des données (Workflow Airflow - données de type Image (ex.: capture de caméra de surveillance))
  • Traitement des données (Workflow Apache Spark - données Stream Flux MQTT (Ex.: capteurs d'humidité)

Processed data zone

  • [Consommation] Déployer une base de données Time Series
  • [Consommation] Déployer une base pour les données hautement connectées
  • [Consommation] Déployer une base de données orientée document

Service zone

  • [IHM] Page data insertion
  • [IHM] Page data download
  • [IHM] CORS Policy handling
  • [API REST] Flask API Rest for insertion + download

Automatic deployment

  • Folders restructuration for Ansible
  • Docker container for : MongoDB enterprise, Airflow 2.0, InfluxDB

Repository

  • Some code cleaning
  • Documentation enhancement
  • License

Project

  • First scientific paper (IDEAS 2021 : 25th anniversary conference)

(See #81 : Modis master merge + code reviewing)

V0.0.3 : IRIT Gitlab transfer

09 Jun 16:08

Choose a tag to compare

Pre-release

Only documentation update and changes for Gitlab IRIT transfert of the repository.

V0.0.2 : wait for merge

09 Jun 16:06
026531c

Choose a tag to compare

Pre-release

[IHM] Page Home
[IHM] Page de stockage de données
[IHM] Page de téléchargement des données brutes
[IHM] Filtrage de l'affichage des données brutes

[Service] Webservice REST API - Lecture des métadonnées depuis la base MongoDB pour affichage des données brutes
[Service] Web Service REST - Téléchargement de données brutes
[Service] Web Service REST - Stockage de données brutes
[Service] Mise à niveau de l'API FLASK
[Service] Filtrage des données brutes

[Gestion] Stockage des données CSV (données temps réel historiques)
[Gestion] Intégration et paramétrage du Trigger dans la zone de Stockage
[Gestion] Création de base de données MongoDB

V0.0.1

06 Apr 16:02
a7464b7

Choose a tag to compare

V0.0.1 Pre-release
Pre-release

Set a fixed version as a work base with :

Openstack Swift
Airflow
Metadata database (mongodb)
first version of frontend
first version of RESTful API
beginning the dockerization, Ansible deployment
first version of documentation
first use case (neocampus mongodb to influxdb workflow, datanoos workflow, POC for some file type (json, jpeg, etc...))