This is the Scala repository of sparkmobility. The Python user interface for sparkmobility can be found at sparkmobility.
Whether modifying the Scala core of the project or running it, you must compile the Scala code and create a JAR file. Follow these steps:
-
Download and Install SBT:
Install sbt on your machine. You can follow the instructions on the official sbt documentation.
-
Package the Scala Code:
Install,Compile and Package the Scala code using sbt:
sbt update sbt compile sbt assembly
This process creates a
.jarfile that should be submitted to the Spark cluster. Ideally, place it in the root directory of this project. You will need the PATH to this jar in the next steps.
Sometimes, the default Scala version is 2.12. Since the project is compatible only with Scala 2.13 and above, you must set the Scala version before setting up the project. To do this, follow the instructions below:
-
Download Spark:
- Go to the Apache Spark download page.
- Select the following options:
- Spark release: 3.3.x or later
- Package type: Pre-built for Apache Hadoop
- Scala version: 2.13
- Download the
.tgzfile.
-
Extract the Spark Distribution: Extract the downloaded file to a directory (e.g.,
/opt/spark):tar -xzf spark-3.5.4-bin-hadoop3-scala2.13.tgz -C /opt/spark
-
Set Environment Variables: Add the following environment variables to your shell configuration file (e.g.,
.bashrcor.zshrc):export SPARK_HOME=/opt/spark/spark-3.5.4-bin-hadoop3-scala2.13 export PATH=$SPARK_HOME/bin:$PATH export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=python3
-
Reload the Shell Configuration:
source ~/.bashrc # or source ~/.zshrc
-
Verify the Scala Version: Run the following command to verify the Scala version:
spark-shell --version
This will display information including the Scala version. Ensure it shows
Scala version: 2.13.x.
This is a standard sbt project. You can use the following commands:
- Update dependencies:
sbt update - Compile the code:
sbt compile - Run the project:
sbt run - Start a Scala REPL:
sbt console
The scala code is located in the src/main/scala directory. You can modify the code in this directory.
Once edited you must follow the steps below to compile and package the code.
sbt compile
sbt assembly
This process creates a .jar file(Usually under target/scala-2.13/ dir) that should be submitted to the Spark cluster.
To run the project, you can use the following command:
spark-submit \
--class com.timegeo.Main \
--master <your_master_url> \ # e.g., local[*], yarn
--driver-memory <MEMORY>g \
--executor-memory <MEMORY>g \
--conf "spark.driver.extraJavaOptions=-Dlog4j.rootLogger=WARN,console" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.rootLogger=WARN,console" \
/path/to/your/application.jarFor more information on sbt, visit the official sbt documentation.
This project is licensed under the MIT License. See the LICENSE file for details.
- Albert Cao (@caoalbert)
- Christopher Chávez (@Vanchristoph3r)
- Mingyi He (@Hemy17)
- Wolin Jiang (@LowenJiang)
- Giuseppe Perona (@g-perona)
- Jiaman Wu (@charmainewu)