Skip to content

Re-written, server based Spline version, powered by ArangoDB

Choose a tag to compare

@wajda wajda released this 16 Dec 17:31
· 553 commits to develop since this release

New vision

In this release we have completely revised the vision and architecture of Spline.
Starting from 0.4 release Spline has begun its journey from being a simple Spark-only lineage tracking tool towards a more generic concept - a cross-framework data lineage tracking solution. The new vision covers much broader aspects of lineage tacking, including (at certain extent) real-time monitoring, errors tracking, impact analysis and many more. Spline version 0.4.0 is the first version of that "new" Spline. It doesn't contain any brand-new features so far comparing to Spline 0.3.9, but it rather provides a brand-new background and architecture.

New Architecture

Spline core is now split into two main parts - a Spline server and a Spline agent.
The Spline server is implemented in form of the Spline REST Gateway that exposes two independent REST APIs - the Producer API (used by Agent to send the metadata to the server), and the Consumer API (used by the Spline UI or other parties to get the collected linage data). Although both APIs will evolve in the future versions, we'll try our best to maintain backward compatibility with the Producer one.

Migration from Spline 0.3

Spline 0.4 comes with the command line migration tool that can be used to migrate old Spline 0.3 data from a MongoDB to a new Spline 0.4 storage, that is now based on the ArangoDB.

Atlas support

Atlas integration has been removed (again) from Spline 0.4.0 and will most likely be re-introduced in some another shape in one of the future Spline versions (see #279)

Improvements in Spark Agent

There also were a number improvements in the Spark operations support. For example we added support for Delta, Kafka and JDBC (as both source and target for batch jobs), and also support for some Spark SQL commands (e.g. create table as, drop table and others).
See https://github.com/AbsaOSS/spline/blob/release/0.4.0/spark/agent/README.md

Containerization

Spline is now much easier to try out and use in clouds as all its moving parts are implemented as Docker containers - ArangoDB, Spline REST Gateway and Spline UI.
The Spline Agent for Spark is now of a much smaller size (due to less amount of dependencies) and, just like in a previous Spline 0.3.9 is shipped in form of a pre-build bundle for three major Spark versions - 2.2, 2.3 and 2.4 - https://search.maven.org/search?q=spark-agent-bundle