Skip to content

Size of fat jars #340

@aalexandrov

Description

@aalexandrov

This is a general discussion question regarding the size of the fat-jars produced by the emma-spark-examples and emma-flink-examples modules.

Running

find -name '*jar' | grep -v original | grep -v nexus | xargs du -hs 

in the project root shows the following output

65M	./emma-examples/emma-examples-spark/target/emma-examples-spark-0.2-SNAPSHOT.jar
64M	./emma-examples/emma-examples-flink/target/emma-examples-flink-0.2-SNAPSHOT.jar
440K	./emma-examples/emma-examples-library/target/emma-examples-library-0.2-SNAPSHOT.jar
420K	./emma-examples/emma-examples-library/target/emma-examples-library-0.2-SNAPSHOT-tests.jar
148K	./emma-spark/target/emma-spark-0.2-SNAPSHOT.jar
148K	./emma-flink/target/emma-flink-0.2-SNAPSHOT.jar
20K	./emma-gui/target/emma-gui-0.2-SNAPSHOT.jar
56K	./emma-quickstart/target/emma-quickstart-0.2-SNAPSHOT.jar
3,7M	./emma-language/target/emma-language-0.2-SNAPSHOT.jar
3,9M	./emma-language/target/emma-language-0.2-SNAPSHOT-tests.jar

The emma-flink-examples and emma-spark-examples jars are ~65M each, which is also indicative of the expected size of any client jars binding emma-language and one of emma-flink or emma-spark in the future.

A closer in emma-spark-examples reveals the root causes (output is similar for the other one).

mvn dependency:list -DincludeScope=runtime -DoutputAbsoluteArtifactFilename=true \
  | grep '/home/alexander/.m2/repository' \
  | awk -F":compile:" '{print $2}' \
  | xargs du -hs \
  | sort -r -h \
  | sed "s|$HOME/.m2/repository/||"

The list looks as follows.

14M	org/scalanlp/breeze_2.11/0.12/breeze_2.11-0.12.jar
12M	org/scalaz/scalaz-core_2.11/7.2.7/scalaz-core_2.11-7.2.7.jar
7,0M	org/spire-math/spire_2.11/0.7.4/spire_2.11-0.7.4.jar
4,4M	org/typelevel/cats-kernel_2.11/0.9.0/cats-kernel_2.11-0.9.0.jar
3,7M	org/emmalanguage/emma-language/0.2-SNAPSHOT/emma-language-0.2-SNAPSHOT.jar
3,4M	com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar
3,3M	org/typelevel/cats-core_2.11/0.9.0/cats-core_2.11-0.9.0.jar
3,0M	org/scalacheck/scalacheck_2.11/1.13.4/scalacheck_2.11-1.13.4.jar
2,0M	org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar
1,2M	org/typelevel/cats-laws_2.11/0.9.0/cats-laws_2.11-0.9.0.jar
1,2M	net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar
1,1M	org/xerial/snappy/snappy-java/1.1.2.6/snappy-java-1.1.2.6.jar
1,0M	org/apache/parquet/parquet-jackson/1.9.0/parquet-jackson-1.9.0.jar
944K	org/apache/parquet/parquet-column/1.9.0/parquet-column-1.9.0.jar
780K	org/apache/parquet/parquet-encoding/1.9.0/parquet-encoding-1.9.0.jar
764K	org/codehaus/jackson/jackson-mapper-asl/1.9.11/jackson-mapper-asl-1.9.11.jar
748K	com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar
724K	org/scalactic/scalactic_2.11/3.0.3/scalactic_2.11-3.0.3.jar
480K	log4j/log4j/1.2.17/log4j-1.2.17.jar
440K	org/emmalanguage/emma-examples-library/0.2-SNAPSHOT/emma-examples-library-0.2-SNAPSHOT.jar
384K	org/apache/parquet/parquet-format/2.3.1/parquet-format-2.3.1.jar
344K	com/univocity/univocity-parsers/2.4.1/univocity-parsers-2.4.1.jar
288K	io/spray/spray-json_2.11/1.3.3/spray-json_2.11-1.3.3.jar
280K	org/typelevel/cats-free_2.11/0.9.0/cats-free_2.11-0.9.0.jar
276K	com/typesafe/config/1.3.1/config-1.3.1.jar
268K	org/apache/parquet/parquet-hadoop/1.9.0/parquet-hadoop-1.9.0.jar
244K	io/verizon/quiver/core_2.11/5.5.14-scalaz-7.2/core_2.11-5.5.14-scalaz-7.2.jar
228K	org/codehaus/jackson/jackson-core-asl/1.9.11/jackson-core-asl-1.9.11.jar
208K	org/typelevel/cats-kernel-laws_2.11/0.9.0/cats-kernel-laws_2.11-0.9.0.jar
180K	org/scalanlp/breeze-macros_2.11/0.12/breeze-macros_2.11-0.12.jar
164K	com/github/mpilquist/simulacrum_2.11/0.10.0/simulacrum_2.11-0.10.0.jar
164K	com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar
148K	org/emmalanguage/emma-spark/0.2-SNAPSHOT/emma-spark-0.2-SNAPSHOT.jar
144K	com/github/scopt/scopt_2.11/3.5.0/scopt_2.11-3.5.0.jar
108K	com/jsuereth/scala-arm_2.11/2.0/scala-arm_2.11-2.0.jar
96K	commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar
88K	org/spire-math/spire-macros_2.11/0.7.4/spire-macros_2.11-0.7.4.jar
72K	commons-codec/commons-codec/1.5/commons-codec-1.5.jar
44K	org/typelevel/discipline_2.11/0.7.2/discipline_2.11-0.7.2.jar
44K	org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar
44K	org/apache/parquet/parquet-common/1.9.0/parquet-common-1.9.0.jar
36K	org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar
24K	com/typesafe/scala-logging/scala-logging-slf4j_2.11/2.1.2/scala-logging-slf4j_2.11-2.1.2.jar
20K	net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar
16K	org/scala-sbt/test-interface/1.0/test-interface-1.0.jar
12K	org/typelevel/catalysts-macros_2.11/0.0.5/catalysts-macros_2.11-0.0.5.jar
12K	org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar
8,0K	org/typelevel/cats-macros_2.11/0.9.0/cats-macros_2.11-0.9.0.jar
8,0K	com/typesafe/scala-logging/scala-logging-api_2.11/2.1.2/scala-logging-api_2.11-2.1.2.jar
4,0K	org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar
4,0K	org/typelevel/cats-jvm_2.11/0.9.0/cats-jvm_2.11-0.9.0.jar
4,0K	org/typelevel/cats_2.11/0.9.0/cats_2.11-0.9.0.jar
4,0K	org/typelevel/catalysts-platform_2.11/0.0.5/catalysts-platform_2.11-0.0.5.jar

It might be better to rely on the breeze version shipped with the dataflow engine rather than bundling our own. @ParkL could you check the versions bundled with Spark 2.1.0 and Flink 1.2.1?

I am not sure what to do with scalaz. It seems that we're only using it due to quiver, and I am not aware of any alternative which has smaller footprint or, say, relies on cats.

I am open for suggestions.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions