This ARM template deploys multiple HDInsight clusters (Spark + Kafka) in the same Virtual Network. Spark's storage is primarily backed by Azure Data Lake Store while Kafka uses Blob Storage.
Since ADLS on HDInsight requires Service Principal with certificate, we've created a Bash script to automate entire deployment. Script creates a self-signed certificate and converts it to PKCS12 format.
- For simplicity we've kept as many resource names as
$CLUSTER_NAMEas possible. VNetaddress space, VM Sizes and number of Head/Worker/Zookeepernodes are hardcoded inside the template.
./deploy.sh <CLUSTER_NAME>
Provide password when prompted. It will be used for accessing all dashboards and SSH.
It takes ~20 minutes to deploy all resources.
- It's not possible to create
Service Principalinside anARMtemplate, since it resides outsideresource groups. - As of now
ADLSis only available in these regions. Kafkadoesn't supportADLSas primary storage.HDInsightdoesn't allow direct connection toKafkaover public internet.- Once an
HDInsightcluster is provisioned, only number of worker nodes can be scaled, not the size of VMs. - Existing
HDInsightcluster cannot join a newVNet.
