6.3.2 #14728
DevinTDHa
announced in
Announcement
6.3.2
#14728
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Spark NLP 6.3.2: Scala 2.13 Support, Layout-Aware Images, and Enhanced LightPipeline Tracking
Spark NLP 6.3.2 is a foundational release that introduces official support for Scala 2.13, alongside important improvements in document layout understanding and lightweight inference workflows.
This release improves long-term model portability through JSON-based serialization, enriches document image extraction with spatial metadata, and enhances
LightPipelinewith document ID tracking and output filtering.🔥 Highlights
Reader2Imagefor HTML, DOCX, and PPTX documents.LightPipelinewith document ID propagation and output column filtering for better batch inference workflows.🚀 New Features & Enhancements
Scala 2.13 Support
Spark NLP now supports Scala 2.13 with this release! This will enable you to run your Spark NLP pipelines on Spark versions that run on Scala 2.13, such as used by Databricks and Dataproc. See our Installation Instructions for Scala 2.13 on how to use it with our project.
There are some things you have to consider when using the Scala 2.13 version
spark-nlp_2.12tospark-nlp_2.13.SPARK_HOMEenvironment variable to a Spark Scala 2.13 installation, or install PySpark from the official Spark archives.DependencyParserModelorTextMatcherModelfrom Scala 2.12 into Scala 2.13, you will need to manually export them again with the latest version. See the notebookLayout-Aware Image Metadata in
Reader2ImageThe
Reader2Imageannotator now extracts spatial image coordinates from rich document formats, adding layout awareness to image annotations.x,y,width,heightThis enables:
Document ID Support in
LightPipelineLightPipelinenow supports passing document IDs together with text inputs, improving traceability in batch and production inference scenarios.Key capabilities:
fullAnnotate(ids, texts)annotate(ids, texts)doc_id)output_colsparameter to restrict returned annotation typesBenefits:
Existing
LightPipelineusage remains unchanged and backward compatible.🐛 Bug Fixes
❤️ Community Support
💻 Installation
Python
Spark Packages
CPU
GPU
Apple Silicon
AArch64
Maven
Supported on on Apache Spark 3.x.
spark-nlp
spark-nlp-gpu
spark-nlp-silicon
spark-nlp-aarch64
FAT JARs
What's Changed
Full Changelog: 6.3.1...6.3.2
This discussion was created from the release 6.3.2.
Beta Was this translation helpful? Give feedback.
All reactions