Releases: AbsaOSS/spline
Maintenance release - 0.5.3
Maintenance release
In this release we addressed some issues and made a few enhancements:
UI
- More readable and user-friendly visualization of expressions and generic operation properties
- Write and Read operations now displays more metadata.
- Display arrows on graph edges
- Ability to expand lineage overview graphs of depth greater than 10 jobs (minor ArangoDB upgrade is required, see https://absaoss.github.io/spline/0.5.html)
Admin CLI
- #696 Admin CLI: Option '-s` is not working
Gateway
- #697 Added informative index page to monitor the server status
Scala 2.12, Attribute lineage highlighting, MongoDB, ElasticSearch, Cassandra and more
Spline 0.5 comes with a few major improvements and new features:
Spline Agent for Spark (aka Spline Harvester)
-
Agent is now a separate project, located in its own repo
and loosely coupled with the Spline core via the Producer REST API -
[#212] Scala 2.12 is supported (for Spark 2.4)
-
Support for some alternative data source types (special thanks to @radford1)
- [#606] ElasticSearch connector support
- [#605] MongoDB connector support
- [#604] Cassandra connector support
Spline Core and UI
- [#112][#559][#451] Attribute lineage and impact highlighting
- [#442] Search by attribute name
- [#616] Support conventional upper snake notation for environment variables
- [#620] ArangoDB SSL connection support
- [#617] ArangoDB fallback config support
- [#615] Connection Performance Improvements (Timeout config, gzip)
- [#561] Docker composite layer for Demo (thanks to @rtyler)
Other improvements and bug fixes
- [#572] Cross-framework end-to-end lineage example
- [#534][#554] Improve logging
- [#635] Client UI WebJar should not have any bytecode dependencies
- [#612] Lineage Timestamp format is misleading
- [#565] Admin CLI doesn't exit normally
- [#535] spark-shell / pyspark --packages not working
- [#629] Missing configuration property spline.consumer.url
...and more
Support for Excel data source
Bugfix release
#527 - Arrays of primitives should not show possibility of being unfolded
#522 - Cannot send lineage data
#518 - WebJar isn't published to Maven automatically
#515 - [spark-agent]0.4.0 HttpLineageDispatcher.ensureProducerReady is missing the Success case
#425 - Migrator: check that Producer is accessible before start migration
Re-written, server based Spline version, powered by ArangoDB
New vision
In this release we have completely revised the vision and architecture of Spline.
Starting from 0.4 release Spline has begun its journey from being a simple Spark-only lineage tracking tool towards a more generic concept - a cross-framework data lineage tracking solution. The new vision covers much broader aspects of lineage tacking, including (at certain extent) real-time monitoring, errors tracking, impact analysis and many more. Spline version 0.4.0 is the first version of that "new" Spline. It doesn't contain any brand-new features so far comparing to Spline 0.3.9, but it rather provides a brand-new background and architecture.
New Architecture
Spline core is now split into two main parts - a Spline server and a Spline agent.
The Spline server is implemented in form of the Spline REST Gateway that exposes two independent REST APIs - the Producer API (used by Agent to send the metadata to the server), and the Consumer API (used by the Spline UI or other parties to get the collected linage data). Although both APIs will evolve in the future versions, we'll try our best to maintain backward compatibility with the Producer one.
Migration from Spline 0.3
Spline 0.4 comes with the command line migration tool that can be used to migrate old Spline 0.3 data from a MongoDB to a new Spline 0.4 storage, that is now based on the ArangoDB.
Atlas support
Atlas integration has been removed (again) from Spline 0.4.0 and will most likely be re-introduced in some another shape in one of the future Spline versions (see #279)
Improvements in Spark Agent
There also were a number improvements in the Spark operations support. For example we added support for Delta, Kafka and JDBC (as both source and target for batch jobs), and also support for some Spark SQL commands (e.g. create table as, drop table and others).
See https://github.com/AbsaOSS/spline/blob/release/0.4.0/spark/agent/README.md
Containerization
Spline is now much easier to try out and use in clouds as all its moving parts are implemented as Docker containers - ArangoDB, Spline REST Gateway and Spline UI.
The Spline Agent for Spark is now of a much smaller size (due to less amount of dependencies) and, just like in a previous Spline 0.3.9 is shipped in form of a pre-build bundle for three major Spark versions - 2.2, 2.3 and 2.4 - https://search.maven.org/search?q=spark-agent-bundle
Bugfix release
release/0.3.9 [maven-release-plugin] copy for tag release/0.3.9
Bugfix release
release/0.3.8 [maven-release-plugin] copy for tag release/0.3.8
PySpark, Codeless init, Uber JAR, 'saveAsTable' etc, bugfixes
In addition to bugfixes and performance improvements this release introduces the following features:
- PySpark support
- Codeless initialization (via
spark.sql.queryExecutionListenersproperty) - Support for
saveAsTable,insertIntocommands and JDBC datasource.
Also Spline is now available as an uber-JAR.
Spark 2.4 + Atlas support
This release add Spark 2.4 support and brings back Apache Atlas integration (that was removed in previous releases). It also fixes a few bugs.