Skip to content

Azure ARM template to deploy Kafka and Spark clusters in same VNet with ADLS

License

Notifications You must be signed in to change notification settings

syedhassaanahmed/azure-kafka-spark-adls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

azure-kafka-spark-adls

Deploy to Azure

This ARM template deploys multiple HDInsight clusters (Spark + Kafka) in the same Virtual Network. Spark's storage is primarily backed by Azure Data Lake Store while Kafka uses Blob Storage.

Since ADLS on HDInsight requires Service Principal with certificate, we've created a Bash script to automate entire deployment. Script creates a self-signed certificate and converts it to PKCS12 format.

Caveats

  • For simplicity we've kept as many resource names as $CLUSTER_NAME as possible.
  • VNet address space, VM Sizes and number of Head/Worker/Zookeeper nodes are hardcoded inside the template.

Prerequisites

Deploy

./deploy.sh <CLUSTER_NAME>

Provide password when prompted. It will be used for accessing all dashboards and SSH. It takes ~20 minutes to deploy all resources.

Limitations

  • It's not possible to create Service Principal inside an ARM template, since it resides outside resource groups.
  • As of now ADLS is only available in these regions.
  • Kafka doesn't support ADLS as primary storage.
  • HDInsight doesn't allow direct connection to Kafka over public internet.
  • Once an HDInsight cluster is provisioned, only number of worker nodes can be scaled, not the size of VMs.
  • Existing HDInsight cluster cannot join a new VNet.

Resources

About

Azure ARM template to deploy Kafka and Spark clusters in same VNet with ADLS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages