Table of Contents generated with DocToc

Using Hive

Cluster Provisioning

Cluster provisioning begins when a caller creates a ClusterDeployment CR, which is the core Hive resource used to control the lifecycle of a cluster and the Hive API entrypoint.

Hive comes with an optional hiveutil binary to assist creating the ClusterDeployment and its dependencies. See the hiveutil documentation for more information.

DNS

Native

For clouds where there is support for automated IP allocation and DNS configuration, (AWS, Azure, IBM Cloud and GCP) an OpenShift installation requires a live and functioning DNS zone in the cloud account into which you will be installing the new cluster(s). For example if you own example.com, you could create a hive.example.com subdomain in Route53, and ensure that you have made the appropriate NS entries under example.com to delegate to the Route53 zone. When creating a new cluster, the installer will make future DNS entries under hive.example.com as needed for the cluster(s).

Managed DNS

In addition to the default OpenShift DNS support, Hive offers a DNS feature called Managed DNS. With Managed DNS, Hive can automatically create delegated zones for approved base domains. For example, if hive.example.com exists and is specified as your managed domain, you can specify a base domain of cluster1.hive.example.com on your ClusterDeployment, and Hive will create this zone for you, add forwarding records in the base domain, wait for it to resolve, and then proceed with installation. Read here for more details.

Non-native

For other platforms/clouds (OpenStack and VSphere), there is presently no native DNS auto-configuration available. This requires some up-front DNS configuration before a cluster can be installed. It will typically be necessary to reserve virtual IPs (VIPs) that will be used for the cluster's management (eg api.mycluster.hive.example.com) and for the cluster's default ingress routes (eg \*.apps.mycluster.hive.example.com). Each platform/cloud's configuration will have its own system for alocating or reserving these IPs. Once the IPs are reserved, DNS entries must be published as A records (or simply making local host entries to manage the DNS-to-IP translations on the host(s) running Hive) so that the cluster's API endpoint will be accessible to Hive.

Pull Secret

OpenShift installation requires a pull secret obtained from try.openshift.com. You can specify an individual pull secret for each cluster Hive creates, or you can use a global pull secret that will be used by all of the clusters Hive creates.

oc create secret generic mycluster-pull-secret --from-file=.dockerconfigjson=/path/to/pull-secret --type=kubernetes.io/dockerconfigjson --namespace mynamespace

apiVersion: v1
data:
  .dockerconfigjson: REDACTED
kind: Secret
metadata:
  name: mycluster-pull-secret
  namespace: mynamespace
type: kubernetes.io/dockerconfigjson

When a global pull secret is defined in the hive namespace and a ClusterDeployment-specific pull secret is specified, the registry authentication in both secrets will be merged and used by the new OpenShift cluster. When a registry exists in both pull secrets, precedence will be given to the contents of the cluster-specific pull secret.

The global pull secret must live in the hive namespace and is referenced in the HiveConfig.

oc create secret generic global-pull-secret --from-file=.dockerconfigjson=/path/to/pull-secret --type=kubernetes.io/dockerconfigjson --namespace hive

apiVersion: v1
data:
  .dockerconfigjson: REDACTED
kind: Secret
metadata:
  name: global-pull-secret
  namespace: hive
type: kubernetes.io/dockerconfigjson

oc patch hiveconfig hive --type=merge --patch '{"spec": {"globalPullSecretRef": {"name": "global-pull-secret"}}}'

spec:
  globalPullSecretRef:
    name: global-pull-secret

OpenShift Version

Hive needs to know what version of OpenShift to install. A Hive cluster represents available versions via the ClusterImageSet resource, and there can be multiple ClusterImageSets available. Each ClusterImageSet references an OpenShift release image. A ClusterDeployment references a ClusterImageSet via the spec.provisioning.imageSetRef property.

Alternatively, you can specify an individual OpenShift release image in the ClusterDeployment spec.provisioning.releaseImage property.

An example ClusterImageSet:

apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-v4.3.0
spec:
  releaseImage: quay.io/openshift-release-dev/ocp-release:4.3.0-x86_64

Cloud credentials

Hive requires credentials to the cloud account into which it will install OpenShift clusters. Refer to the installer documentation for required level of permissions for each cloud.

AWS

Create a secret containing your AWS access key and secret access key:

oc create secret generic <mycluster>-aws-creds -n hive --from-literal=aws_access_key_id=<AWS_ACCESS_KEY_ID> --from-literal=aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>

Take care when using the yaml below, you need to use base64 to encode the data values.

apiVersion: v1
data:
  aws_access_key_id: REDACTED
  aws_secret_access_key: REDACTED
kind: Secret
metadata:
  name: mycluster-aws-creds
  namespace: mynamespace
type: Opaque

Azure

Create a secret containing your Azure service principal:

apiVersion: v1
data:
  osServicePrincipal.json: REDACTED
kind: Secret
metadata:
  name: mycluster-azure-creds
  namespace: mynamespace
type: Opaque

GCP

Create a secret containing your GCP service account key:

apiVersion: v1
data:
  osServiceAccount.json: REDACTED
kind: Secret
metadata:
  name: mycluster-gcp-creds
  namespace: mynamespace
type: Opaque

IBM Cloud

Create a secret containing your IBM Cloud API key:

apiVersion: v1
stringData:
  ibmcloud_api_key: IBMCLOUDAPIKEY
kind: Secret
metadata:
  name: mycluster-ibm-creds
  namespace: mynamespace
type: Opaque

IBM Cloud Credential Manifests

IBM Cloud credential secrets must be provided as manifests for installation. Follow instructions for using ccoctl to generate IBM Cloud service IDs and place manifests generated from running ccoctl ibmcloud create-service-id within a secret that will be referenced by the ClusterDeployment.

Create a manifests secret containing secrets generated by ccoctl:

oc create secret generic mycluster-manifests -n mynamespace --from-file=<manifests directory>

vSphere

Create a secret containing your vSphere credentials information:

apiVersion: v1
stringData:
  password: vsphereuser
  username: secretpassword
kind: Secret
metadata:
  name: mycluster-vsphere-creds
  namespace: mynamespace
type: Opaque

Create a secret containing your vSphere CA certificate.

From the vCenter home page, download the vCenter’s root CA certificates. Click Download trusted root CA certificates in the vSphere Web Services SDK section. Download, wget or curl the /certs/download.zip file.

wget https://<vCenter>/certs/download.zip

Extract the compressed file that contains the vCenter root CA certificates. The contents of the compressed file resemble the following file structure:

certs
├── lin
│   ├── 108f4d17.0
│   ├── 108f4d17.r1
│   ├── 7e757f6a.0
│   ├── 8e4f8471.0
│   └── 8e4f8471.r0
├── mac
│   ├── 108f4d17.0
│   ├── 108f4d17.r1
│   ├── 7e757f6a.0
│   ├── 8e4f8471.0
│   └── 8e4f8471.r0
└── win
    ├── 108f4d17.0.crt
    ├── 108f4d17.r1.crl
    ├── 7e757f6a.0.crt
    ├── 8e4f8471.0.crt
    └── 8e4f8471.r0.crl

3 directories, 15 files

Create a single file by concatenating all the files in certs/lin. Save the file somewhere permanent -- you'll need it for each vSphere cluster you want to create.

cat certs/lin/* > /home/me/vsphere/ca.cert

Create a secret containing the combined CA bundle data within a .cacert key:

oc create secret generic mycluster-vsphere-certs --from-file=.cacert=/home/me/vsphere/ca.cert

apiVersion: v1
stringData:
  .cacert: |
    -----BEGIN CERTIFICATE-----
    CA BUNDLE DATA HERE
    -----END CERTIFICATE-----
kind: Secret
metadata:
  name: mycluster-vsphere-certs
  namespace: mynamespace
type: Opaque

OpenStack

Create a secret containing your OpenStack clouds.yaml file:

apiVersion: v1
data:
  clouds.yaml: REDACTED
kind: Secret
metadata:
  name: mycluster-openstack-creds
  namespace: mynamespace
type: Opaque

Nutanix

To provision an OpenShift cluster on Nutanix using Hive, you must provide the necessary cloud credentials. These credentials are used by Hive to interact with the Nutanix environment and perform cluster provisioning operations.

Required Credentials

Hive requires the following credentials for Nutanix:

Prism Central Username: The username with sufficient privileges to create and manage virtual machines.
Prism Central Password: The password associated with the provided username.

Creating a Secret for Credentials

The Nutanix credentials must be stored as a Kubernetes secret in the namespace where Hive operates. Create a secret with the following format:

apiVersion: v1
kind: Secret
metadata:
  name: nutanix-cloud-credentials
  namespace: hive
type: Opaque
data:
  username: <base64-encoded-username>
  password: <base64-encoded-password>

To create the secret using oc, first encode the values in Base64:

echo -n "<value>" | base64

Then, apply the secret using:

oc apply -f nutanix-cloud-credentials.yaml

Additional Required Secrets

In addition to the Hive credentials, OpenShift requires additional secrets in specific namespaces for authentication with Nutanix. These credentials can be created manually or by using the CCO utility (ccoctl) to generate the credential Secret manifests for the OpenShift installer. (See the following link for more details)

Secret for OpenShift Machine API

This secret is required by the OpenShift Machine API to manage machines on Nutanix:

apiVersion: v1
kind: Secret
metadata:
  name: nutanix-credentials
  namespace: openshift-machine-api
type: Opaque
stringData:
  credentials: |
    [{"type":"basic_auth","data":{"prismCentral":{"username":"${NUTANIX_USERNAME}","password":"${NUTANIX_PASSWORD}"}}}]

Secret for OpenShift Cloud Controller Manager

This secret is required by the OpenShift Cloud Controller Manager to integrate OpenShift with Nutanix infrastructure:

apiVersion: v1
kind: Secret
metadata:
   name: nutanix-credentials
   namespace: openshift-cloud-controller-manager
type: Opaque
stringData:
  credentials: |
    [{"type":"basic_auth","data":{"prismCentral":{"username":"${NUTANIX_USERNAME}","password":"${NUTANIX_PASSWORD}"}}}]

Why These Secrets Are Required

Hive Secret: Used by Hive for provisioning clusters and managing resources.
Machine API Secret: Required for the OpenShift Machine API to manage and create worker nodes on Nutanix.
Cloud Controller Manager Secret: Enables OpenShift to interact with Nutanix for networking, load balancing, and other infrastructure-related tasks.

Each of these secrets plays a critical role in ensuring a seamless integration between OpenShift and Nutanix, allowing for automated cluster deployment and lifecycle management.

Using the Secret in Hive

Once the secret is created, reference it in the ClusterDeployment or ClusterPool configuration:

spec:
  platform:
    nutanix:
      credentialsSecretRef:
        name: nutanix-cloud-credentials

This ensures that Hive can retrieve the necessary credentials to interact with Nutanix for cluster provisioning.

Additional Considerations

Ensure that the Nutanix user has appropriate permissions to create and manage virtual machines, networks, and other resources required by OpenShift installation and management.
Verify that network connectivity exists between the OpenShift cluster nodes and the Nutanix infrastructure endpoints (Prism Central and Prism Elements).
If the Nutanix Prism Central uses certificates that are not trusted by default (such as those signed by a private certificate authority), additional TLS configuration may be required.
During installation (day 0), Prism Central certificates can be trusted by specifying an additionalTrustBundle in the install-config.yaml. After installation (day 2), ongoing communication by Hive requires configuring a certificatesSecretRef in the ClusterDeployment platform configuration.
If both additionalTrustBundle and certificatesSecretRef are provided, they can reference different certificate bundles if needed. Otherwise, the certificates from certificatesSecretRef will be used for both installation and day 2 operations.

By setting up these credentials correctly, Hive will be able to deploy OpenShift clusters on Nutanix efficiently.

TLS and Certificate Trust Configuration

When using certificates to establish trust with Nutanix Prism Central, Hive handles certificates in the following ways depending on the configuration:

Case 1: additionalTrustBundle set in install-config.yaml, no certificatesSecretRef in Hive
The installer will use the provided trust bundle. Hive will not inject any additional certificates.
Important: If the cluster nodes (or installer pod) environment does not have SSL_CERT_DIR configured properly, the installation might fail due to untrusted Prism Central certificates.
Case 2: certificatesSecretRef set in Hive, no additionalTrustBundle in install-config.yaml
Hive will automatically inject the certificates from certificatesSecretRef into the install-config's additionalTrustBundle before installation.
It will also set additionalTrustBundlePolicy: Always to ensure the certificates are trusted both during installation and runtime.
Case 3: both additionalTrustBundle and certificatesSecretRef are set
Hive will not modify the install-config. The installer will use the additionalTrustBundle exactly as provided.
No certificate injection will occur, and the install-config's bundle will be used for establishing trust.

By carefully choosing where to specify certificates, you can control whether the trust setup is handled at install time, during day 2 operations, or both.

No Need to Specify Credentials in Install Config

Hive automatically injects the necessary Nutanix credentials during the provisioning process. Therefore, there is no need to manually specify the Prism Central username and password in the install configuration. By referencing the created secrets, Hive ensures secure and seamless authentication with Nutanix.

By setting up these credentials correctly, Hive will be able to deploy OpenShift clusters on Nutanix efficiently.

SSH Key Pair

(Optional) Hive uses the provided ssh key pair to ssh into the machines in the remote cluster. Hive connects via ssh to gather logs in the event of an installation failure. The ssh key pair is optional, but neither the user nor Hive will be able to ssh into the machines if it is not supplied.

Create a Kubernetes secret containing a ssh key pair in PEM format (typically generated with ssh-keygen -m PEM)

apiVersion: v1
data:
  ssh-privatekey: REDACTED
  ssh-publickey: REDACTED
kind: Secret
metadata:
  name: mycluster-ssh-key
  namespace: mynamespace
type: Opaque

InstallConfig

The OpenShift installer InstallConfig must be stored in a secret and referenced in the ClusterDeployment. This allows Hive to more easily support installing multiple versions of OpenShift.

First, retrieve the public key for the SSH key pair you created earlier, if you created one:

ssh_public_key=$(oc extract secret/mycluster-ssh-key --keys=ssh-publickey --to=-)

Then create a file called install-config.yaml that will contain your InstallConfig. The example below provides an InstallConfig for AWS.

cat >./install-config.yaml <<-EOF
apiVersion: v1
baseDomain: hive.example.com
compute:
- name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 120
        type: gp3
      type: m5.xlarge
  replicas: 3
controlPlane:
  name: master
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 120
        type: gp3
      type: m5.xlarge
metadata:
  name: mycluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-east-1
pullSecret: mycluster-pull-secret
# Remove the line below if you did not create an SSH key.
sshKey: $ssh_public_key
EOF

Finally, create a generic Kubernetes secret from the InstallConfig you just created:

oc create secret generic mycluster-install-config --from-file=install-config.yaml=./install-config.yaml

For Azure, replace the contents of compute.platform and controlPlane.platform with:

    azure:
      osDisk:
        diskSizeGB: 128
      type: Standard_D2s_v3

and replace the contents of platform with:

  azure:
    cloudName: AzurePublicCloud
    baseDomainResourceGroupName: my-bdrgn
    region: centralus

Note: cloudName specifies the Azure Cloud in which to create the cluster e.g. AzurePublicCloud or AzureUSGovernmentCloud.

For GCP, replace the contents of compute.platform and controlPlane.platform with:

    gcp:
      type: n1-standard-4

and replace the contents of platform with:

  gcp:
    projectID: myproject
    region: us-east1

For IBM Cloud, replace the contents of compute.platform and controlPlane.platform. Note that type is any valid IBM Cloud instance type. type may be omitted to use OpenShift installation defaults.

  ibmcloud:
    type: bx2-4x16

and populate the top-level platform fields with the appropriate information:

platform:
  ibmcloud:
    region: us-east

and ensure that the top-level credentialsMode field has been set to Manual.

credentialsMode: Manual

For vSphere, ensure the compute and controlPlane fields are empty.

controlPlane:
compute:

and populate the top-level platform fields with the appropriate information:

platform:
  vsphere:
    apiVIP: 192.168.1.10
    cluster: devel
    datacenter: dc1
    defaultDatastore: ds1
    folder: /dc1/vm/CLUSTER_NAME
    ingressVIP: 192.168.1.11
    network: "VM Network"
    password: secretpassword
    username: vsphereuser
    vCenter: vcenter.example.com

For Openstack, replace the contents of compute.platform with:

  openstack:
    type: m1.large

Note: Use an instance type that meets the minimum requirement for the version of OpenShift being installed.

and replace the contents of controlPlane.platform with:

  openstack:
    type: ci.m4.xlarge

Note: Use an instance type that meets the minimum requirement for the version of OpenShift being installed.

and replace the contents of platform with:

  openstack:
    cloud: mycloud
    computeFlavor: m1.large
    externalNetwork: openstack_network_name
    lbFloatingIP: 10.0.111.158

For Nutanix, you need to specify the Nutanix platform configuration in your install-config.yaml. Below is the required platform section:

platform:
  nutanix:
    apiVIPs:
      - 10.0.2.12
    ingressVIPs:
      - 10.0.2.11
    prismCentral:
      endpoint:
        address: "prism-central.example.com"
        port: 9440
    prismElements:
      - endpoint:
          address: "prism-element-1.example.com"
          port: 9440
        uuid: "prism-elements-uuid-1234"
        name: "Prism-Element-1"
    subnetUUIDs:
      - "subnet-uuid-1234"
    failureDomains:
      - name: "Local_AZ"
        subnetUUIDs:
          - "subnet-uuid-1234"
        prismElement:
          endpoint:
            address: "prism-element-1.example.com"
            port: 9440
          uuid: "prism-elements-uuid-1234"
          name: "Prism-Element-1"

Note: The failureDomains section is optional and can be omitted if not required.

Required Secrets

nutanix-creds: A secret containing the credentials for Prism Central.
install-config: A secret holding the OpenShift install configuration.
ssh-private-key: A secret containing the SSH private key for cluster access.

Additional Considerations

Ensure that the prismCentral.endpoint and prismElements.endpoint addresses specified in the install-config are reachable from the environment where Hive runs.
The subnetUUIDs must correspond to existing Nutanix subnets where the cluster nodes will be deployed.

ClusterDeployment

Cluster provisioning begins when a ClusterDeployment is created.

Note that some parts are duplicated with the InstallConfig.

An example ClusterDeployment for AWS:

apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: mycluster
  namespace: mynamespace
spec:
  baseDomain: hive.example.com
  clusterName: mycluster
  platform:
    aws:
      credentialsSecretRef:
        name: mycluster-aws-creds
      region: us-east-1
  provisioning:
    imageSetRef:
      name: openshift-v4.3.0
    installConfigSecretRef:
      name: mycluster-install-config
    sshPrivateKeySecretRef:
      name: mycluster-ssh-key
  pullSecretRef:
    name: mycluster-pull-secret

For Azure, replace the contents of spec.platform with:

azure:
  baseDomainResourceGroupName: my-bdrgn
  credentialsSecretRef:
    name: mycluster-azure-creds
  cloudName: AzurePublicCloud
  region: centralus

Note: cloudName specifies the Azure Cloud in which to create the cluster e.g. AzurePublicCloud or AzureUSGovernmentCloud.

For GCP, replace the contents of spec.platform with:

gcp:
  credentialsSecretRef:
    name: mycluster-gcp-creds
  region: us-east1

For IBM Cloud, replace the contents of spec.platform with:

ibmcloud:
  credentialsSecretRef:
    name: mycluster-ibm-creds
  region: us-east

and add a manifests secret reference to spec.provisioning:

provisioning:
  manifestsSecretRef:
    name: mycluster-manifests

For vSphere, replace the contents of spec.platform with:

vsphere:
  certificatesSecretRef:
    name: mycluster-vsphere-certs
  cluster: devel
  credentialsSecretRef:
    name: mycluster-vsphere-creds
  datacenter: dc1
  defaultDatastore: ds1
  folder: /dc1/vm/CLUSTER_NAME
  network: "VM Network"
  vCenter: vsphere.example.com

For OpenStack, replace the contents of spec.platform with:

openstack:
  cloud: mycloud
  credentialsSecretRef:
    name: mycluster-openstack-creds

Machine Pools

MachinePool is a YAML configuration by which you can create and scale worker nodes on a deployed cluster. A MachinePool will create MachineSet resources on the deployed cluster. If supported on your cloud, those MachineSets will automatically span all AZs, or you can specify an explicit list.

A MachinePool for the worker machinesets is not required. If the user creates a MachinePool for the worker MachineSets, then Hive will manage the worker MachineSets.

MachinePool reconciliation is limited to updating MachineSet replicas to match the replicas configured for the MachinePool. Additionally, any existing Labels or Taints on the MachineSets will be overridden if they clash with those on the MachinePool. In case of duplicate taints, the taint encountered first will be preserved and the rest collapsed on the MachineSets.

MachinePool platform is immutable and any changes made to MachinePool.spec.platform are blocked by a validating webhook. The Machine Config Operator does not support updating existing machines when platform details are changed in a MachineSet and consequently Hive does not support making such changes to MachinePool platform, see HIVE-2024.

The recommended workaround when platform details must be changed is to replace the MachinePool by creating an adjacent MachinePool with the desired configuration.

Create replacement MachinePool with desired configuration and MachinePool.spec.replicas = 0.
Scale down the old MachinePool while scaling up the replacement MachinePool.

InstallConfig is limited to the one worker pool, but Hive can sync additional MachinePools Day 2.

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  name: mycluster-worker
  namespace: mynamespace
spec:
  clusterDeploymentRef:
    name: mycluster
  name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 120
        type: gp3
      type: m5.xlarge
  replicas: 3

For Azure, replace the contents of spec.platform with:

azure:
  osDisk:
    diskSizeGB: 128
  type: Standard_D2s_v3

For GCP, replace the contents of spec.platform with:

gcp:
  type: n1-standard-4

WARNING: Due to some naming restrictions on various components in GCP, Hive will restrict you to a max of 35 MachinePools (including the original worker pool created by default). We are left with only a single character to differentiate the machines and nodes from a pool, and 'm' is already reserved for the master hosts, leaving us with a-z (minus m) and 0-9 for a total of 35. Hive will automatically create a MachinePoolNameLease for GCP MachinePools to grab one of the available characters until none are left, at which point your MachinePool will not be provisioned.

For IBM Cloud, replace the contents of spec.platform. Note that type is any valid IBM Cloud instance type. type may be omitted to use OpenShift installation defaults.

ibmcloud:
  type: bx2-4x16

For vSphere, replace the contents of spec.platform with the settings you want for the instances. Note that static IPs are not supported.

vsphere:
  coresPerSocket: 1
  cpus: 2
  memoryMB: 8192
  osDisk:
    diskSizeGB: 120

For OpenStack, replace the contents of spec.platform with the settings you want for the instances:

openstack:
  rootVolume:
    size: 10
    type: ceph
  flavor: m1.large

For Nutanix, replace the contents of spec.platform with the settings you want for the instances:

nutanix:
  prismCentral:
    address: prism-central.example.com
    port: 9440
  credentialsSecretRef:
      name: nutanix-creds
  certificatesSecretRef:
      name: prism-central-cert
  failureDomains:
    - name: "Local_AZ"
      subnetUUIDs:
        - "subnet-uuid-1234"
      prismElement:
        endpoint:
          address: "prism-element-1.example.com"
          port: 9440
        uuid: "prism-elements-uuid-1234"
        name: "Prism-Element-1"

Configuring Availability Zones

The desired Availability Zones (AZ) to create new worker nodes in can be specified in the MachinePool YAML (spec.platform.<provider>.zones), for example:

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  name: mycluster-worker
  namespace: mynamespace
spec:
  clusterDeploymentRef:
    name: mycluster
  name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 120
        type: gp3
      type: m5.xlarge
      zones:
        - us-east-1a
        - us-east-1b
  replicas: 3

If the Availability Zones are not configured in the MachinePool, then all of the AZs in the region will be used and a MachineSet resource will be created for each AZ (only relevant for public cloud providers).

Auto-scaling

MachinePools can be configured to auto-scale the number of worker nodes as needed based on resource utilization of the deployed cluster (this feature creates a ClusterAutoscaler resource in the deployed cluster).

apiVersion: hive.openshift.io/v1
kind: MachinePool
metadata:
  name: mycluster-worker
  namespace: mynamespace
spec:
  clusterDeploymentRef:
    name: mycluster
  name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 120
        type: gp3
      type: m5.xlarge
  autoscaling:
    minReplicas: 3
    maxReplicas: 6

The number of minimum replicas must be equivalent to the number of configured Availability Zones.

The spec.replicas and spec.autoscaling configurations cannot be configured simultaneously.

The spec.autoscaling.maxReplicas is an optional field. If it is not configured, then nodes will be auto-scaled without restriction based on resource utilization needs.

Integration with Horizontal Pod Autoscalers

A MachinePool configured to auto-scaling mode creates a ClusterAutoscaler on the deployed cluster. ClusterAutoscalers can co-exist and work with Horiztonal Pod Autoscalers to ensure that there are enough available nodes to meet the auto-scaled pod replica count requirements. See excerpt from OpenShift documentation:

The horizontal pod autoscaler (HPA) and the cluster autoscaler modify cluster resources in different ways. The HPA changes the deployment’s or replica set’s number of replicas based on the current CPU load. If the load increases, the HPA creates new replicas, regardless of the amount of resources available to the cluster. If there are not enough resources, the cluster autoscaler adds resources so that the HPA-created pods can run. If the load decreases, the HPA stops some replicas. If this action causes some nodes to be underutilized or completely empty, the cluster autoscaler deletes the unnecessary nodes.

Create Cluster on Bare Metal

Hive supports bare metal provisioning as provided by openshift-install

At present this feature requires a separate pre-existing libvirt provisioning host to run the bootstrap node. This host will require very specific network configuration that far exceeds the scope of Hive documentation. See Bare Metal Platform Customization for more information.

To provision bare metal clusters with Hive:

Create a Secret containing a bare metal enabled InstallConfig. This InstallConfig must contain a libvirtURI property pointing to the provisioning host.

Create a Secret containing the SSH private key that can connect to your libvirt provisioning host, without a passphrase.

apiVersion: v1
kind: Secret
metadata:
  name: provisioning-host-ssh-private-key
  namespace: mynamespace
stringData:
  ssh-privatekey: |-
    -----BEGIN RSA PRIVATE KEY-----
   REDACTED
    -----END RSA PRIVATE KEY-----
type: Opaque

Create a ConfigMap for manifests to inject into the installer, containing a nested ConfigMap for metal3 config.

NOTE: This will no longer be required as of OpenShift 4.4+.

kind: ConfigMap
apiVersion: v1
metadata:
  name: my-baremetal-cluster-install-manifests
  namespace: mynamespace
data:
  99_metal3-config.yaml: |
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: metal3-config
      namespace: openshift-machine-api
    data:
      http_port: "6180"
      provisioning_interface: "enp1s0"
      provisioning_ip: "172.22.0.3/24"
      dhcp_range: "172.22.0.10,172.22.0.100"
      deploy_kernel_url: "http://172.22.0.3:6180/images/ironic-python-agent.kernel"
      deploy_ramdisk_url: "http://172.22.0.3:6180/images/ironic-python-agent.initramfs"
      ironic_endpoint: "http://172.22.0.3:6385/v1/"
      ironic_inspector_endpoint: "http://172.22.0.3:5050/v1/"
      cache_url: "http://192.168.111.1/images"
      rhcos_image_url: "https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.201911192044.0/x86_64/rhcos-43.81.201911192044.0-openstack.x86_64.qcow2.gz"

Create a ClusterDeployment, note the libvirtSSHPrivateKeySecretRef and sshKnownHosts for bare metal:

apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: my-baremetal-cluster
  namespace: mynamespace
  annotations:
    hive.openshift.io/try-install-once: "true"
spec:
  baseDomain: test.example.com
  clusterName: my-baremetal-cluster
  controlPlaneConfig:
    servingCertificates: {}
  platform:
    baremetal:
      libvirtSSHPrivateKeySecretRef:
        name: provisioning-host-ssh-private-key
  provisioning:
    installConfigSecretRef:
      name: my-baremetal-cluster-install-config
    sshPrivateKeySecretRef:
      name: my-baremetal-hosts-ssh-private-key
    manifestsSecretRef:
      name: my-baremetal-cluster-install-manifests
    imageSetRef:
      name: my-clusterimageset
    sshKnownHosts:
    # SSH known host info for the libvirt provisioning server to avoid a prompt during non-interactive install:
    - "10.1.8.90 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBKWjJRzeUVuZs4yxSy4eu45xiANFIIbwE3e1aPzGD58x/NX7Yf+S8eFKq4RrsfSaK2hVJyJjvVIhUsU9z2sBJP8="
  pullSecretRef:
    name: my-baremetal-cluster-pull-secret

There is not presently support for MachinePool management on bare metal clusters. The pools defined in your InstallConfig are authoritative.

There is not presently support for "deprovisioning" a bare metal cluster, as such deleting a bare metal ClusterDeployment has no impact on the running cluster, it is simply removed from Hive and the systems would remain running. This may change in the future.

Monitor the Install Job

Get the namespace in which your cluster deployment was created

Get the install pod name

oc get pods -l "hive.openshift.io/job-type=provision,hive.openshift.io/cluster-deployment-name=${CLUSTER_NAME}" -o jsonpath='{.items[0].metadata.name}'

Run following command to watch the cluster deployment
```
oc logs -f <install-pod-name> -c hive
```
Alternatively, you can watch the summarized output of the installer using
```
oc exec -c hive <install-pod-name> -- tail -f /tmp/openshift-install-console.log
```

In the event of installation failures, please see Troubleshooting.

Saving Logs for Failed Provisions

Hive can be configured as follows to upload logs to an AWS S3 bucket when provisioning fails.

Create an S3 bucket. The bucket must be accessible from the environment from which your cluster will be provisioned, using credentials you will specify (below). Take note of the name of the bucket and the region in which you created it.
Create a credentials secret. This secret will need to exist in the target namespace of your hive deployment (HiveConfig.spec.targetNamespace, default hive), and contain AWS credentials sufficient to write to your bucket. The secret should data contain base64-encoded values for "aws_access_key_id" and "aws_secret_access_key". (You may wish to reuse the secret from your cluster deployment.)
Create an SSH private key secret. The secret data must contain a key called ssh-privatekey whose value is the base64-encoded contents of the private key file corresponding to the public key in your install config. Create this secret in the namespace of your ClusterDeployment.

Tell HiveConfig where to find your bucket. Under .spec.failedProvisionConfig.aws, add the bucket name, the reference to the AWS credentials secret, and the region. For example:

spec:
  failedProvisionConfig:
    aws:
      bucket: failed-provision-logs
      credentialsSecretRef:
        name: test-retry-aws-creds
      region: us-east-1

Ensure your ClusterDeployment is configured with your SSH private key secret. Reference the SSH private key secret in your ClusterDeployment's `.spec.provisioning.sshPrivateKeySecretRef. For example:
```
spec:
  provisioning:
    sshPrivateKeySecretRef:
      name: mycluster-ssh-key
```
(If using hiveutil, you can provide the key pair from your file system via --ssh-private-key-file and --ssh-public-key-file.)

The troubleshooting doc provides more information about extracting and processing the logs.

Cluster Admin Kubeconfig

Once the cluster is provisioned, the admin kubeconfig will be stored in a secret. You can use this with:

./hack/get-kubeconfig.sh ${CLUSTER_NAME} > ${CLUSTER_NAME}.kubeconfig
export KUBECONFIG=${CLUSTER_NAME}.kubeconfig
oc get nodes

Access the Web Console

Get the webconsole URL

oc get cd ${CLUSTER_NAME} -o jsonpath='{ .status.webConsoleURL }'

Retrieve the password for kubeadmin user

oc extract secret/$(oc get cd ${CLUSTER_NAME} -o jsonpath='{.spec.clusterMetadata.adminPasswordSecretRef.name}') --to=-

Managed DNS

Hive can optionally create delegated DNS zones for each cluster.

NOTE: This feature only works for provisioning to AWS, GCP, and Azure.

To use this feature:

Manually create a DNS zone for your "root" domain (i.e. hive.example.com in the example below) and ensure your DNS is operational.

Create a secret in the "hive" namespace with your cloud credentials with permissions to manage the root zone.

AWS

apiVersion: v1
data:
  aws_access_key_id: REDACTED
  aws_secret_access_key: REDACTED
kind: Secret
metadata:
  name: route53-aws-creds
type: Opaque

The following AWS IAM permissions should be associated with these credentials:

route53:ChangeResourceRecordSets
route53:ChangeTagsForResource
route53:CreateHostedZone
route53:DeleteHostedZone
route53:GetHostedZone
route53:ListHostedZonesByName
route53:ListResourceRecordSets
route53:ListTagsForResource
tag:GetResources

GCP

apiVersion: v1
data:
  osServiceAccount.json: REDACTED
kind: Secret
metadata:
  name: gcp-creds
type: Opaque

Azure Service principal needs DNS Zone Contributor role on DNS zone resource.

apiVersion: v1
data:
  osServicePrincipal.json: REDACTED
kind: Secret
metadata:
  name: azure-creds
type: Opaque

Update your HiveConfig to enable externalDNS and set the list of managed domains:

AWS

apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  managedDomains:
  - aws:
      credentialsSecretRef:
        name: route53-aws-creds
    domains:
    - hive.example.com

GCP

apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  managedDomains:
  - gcp:
      credentialsSecretRef:
        name: gcp-creds
    domains:
    - hive.example.com

Azure

apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  managedDomains:
  - azure:
      credentialsSecretRef:
        name: azure-creds
    domains:
    - hive.example.com

Specify which domains Hive is allowed to manage by adding them to the .spec.managedDomains[].domains list. When specifying manageDNS: true in a ClusterDeployment, the ClusterDeployment's baseDomain must be a direct child of one of these domains, otherwise the ClusterDeployment creation will result in a validation error. The baseDomain must also be unique to that cluster and must not be used in any other ClusterDeployment, including on separate Hive instances.

As such, a domain may exist in the .spec.managedDomains[].domains list in multiple Hive instances. Note that the specified credentials must be valid to add and remove NS record entries for all domains listed in .spec.managedDomains[].domains.

You can now create clusters with manageDNS enabled and a basedomain of mydomain.hive.example.com.

bin/hiveutil create-cluster --base-domain=mydomain.hive.example.com mycluster --manage-dns

Hive will then:

Create a mydomain.hive.example.com DNS zone.
Create NS records in the hive.example.com to forward DNS to the new mydomain.hive.example.com DNS zone.
Wait for the SOA record for the new domain to be resolvable, indicating that DNS is functioning.
Launch the install, which will create DNS entries for the new cluster ("*.apps.mycluster.mydomain.hive.example.com", "api.mycluster.mydomain.hive.example.com", etc) in the new mydomain.hive.example.com DNS zone.

Cluster Adoption

It is possible to adopt cluster deployments into Hive. This will allow you to manage the cluster as if it had been provisioned by Hive, including:

To do so you will need to create a ClusterDeployment with Spec.Installed set to True, no Spec.Provisioning section, and include the following:

cluster INFRAID (obtained from oc get infrastructure cluster -o json | jq .status.infrastructureName)
cluster ID (obtained from oc get clusterversion version -o json | jq .spec.clusterID)
reference to a properly formatted admin kubeconfig Secret: oc create secret generic mycluster-admin-kubeconfig --from-file=kubeconfig=/tmp/admin.kubeconfig
Spec.Platform.YourCloudProvider for your cluster, most importantly region and a properly formatted credentials Secret

Use Spec.PreserveOnDelete = true if you do not want Hive to deprovision resources when the ClusterDeployment is deleted.

Example Adoption ClusterDeployment

apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: my-gcp-cluster
  namespace: hive
spec:
  baseDomain: gcp.example.com
  clusterMetadata:
    adminKubeconfigSecretRef:
      name: my-gcp-cluster-admin-kubeconfig
    clusterID: 61010205-c91d-44c9-8394-3e1790bd76f3
    infraID: my-gcp-cluster-wsvdn
  clusterName: my-gcp-cluster
  installed: true
  platform:
    gcp:
      credentialsSecretRef:
        name: my-gcp-creds
      region: us-east1
  pullSecretRef:
    name: pull-secret

If the cluster you are looking to adopt is on AWS and leverages Privatelink, you'll also need to include that setting under spec.platform.aws to ensure the VPC Endpoint Service for the cluster is tracked in the ClusterDeployment.

  platform:
    aws:
      credentialsSecretRef:
        name: my-aws-cluster-creds
      privateLink:
        enabled: true
      region: us-east-1

If the cluster you are looking to adopt is on AWS and uses a shared VPC, you will also need to include the name of the hosted zone role in spec.clusterMetadata.platform.aws.hostedZoneRole.

  clusterMetadata:
    adminKubeconfigSecretRef:
      name: my-gcp-cluster-admin-kubeconfig
    clusterID: 61010205-c91d-44c9-8394-3e1790bd76f3
    infraID: my-gcp-cluster-wsvdn
    platform:
      aws:
        hostedZoneRole: account-b-zone-role

If the cluster you are looking to adopt is on GCP and uses a shared VPC, you will also need to include the name of the network project ID in spec.clusterMetadata.platform.gcp.networkProjectID.

  clusterMetadata:
    adminKubeconfigSecretRef:
      name: my-gcp-cluster-admin-kubeconfig
    clusterID: 61010205-c91d-44c9-8394-3e1790bd76f3
    infraID: my-gcp-cluster-wsvdn
    platform:
      gcp:
        networkProjectID: some@project.id

Adopting with hiveutil

hiveutil is a development focused CLI tool which can be built from the hive repo. To adopt a cluster specify the following flags:

bin/hiveutil create-cluster --namespace=namespace-to-adopt-into --base-domain=example.com mycluster --adopt --adopt-admin-kubeconfig=/path/to/cluster/admin/kubeconfig --adopt-infra-id=[INFRAID] --adopt-cluster-id=[CLUSTERID]

Transferring ownership

If you wish to transfer ownership of a cluster which is already managed by hive, and have access to the ClusterDeployment, there is no need to create a new ClusterDeployment using hiveutil. Instead, simply do the following:

Save the current ClusterDeployment and relevant creds and certs manifests locally.

oc get cd <clusterdeployment_name> -n <namespace> -o yaml > clusterdeployment.yaml
oc get secrets <clusterdeployment_name_creds> -n <namespace> -o yaml > clusterdeployment_creds.yaml

Edit the ClusterDeployment, setting spec.preserveOnDelete to true. This ensures that the next step will only release the hive resources without destroying the cluster in the cloud infrastructure.
Delete the ClusterDeployment
From the hive instance that will adopt the cluster, oc apply the ClusterDeployment, creds and certs manifests you saved in the first step.

Configuration Management

Vertical Scaling

hive-operator deploys each component (the hive-controllers and hiveadmission Deployments; and the hive-clustersync and hive-machinepool StatefulSets) with default resource requests. If you need to scale any of these components vertically, you may add one or more deploymentConfig sections to HiveConfig's spec. For example:

  deploymentConfig:
  - deploymentName: hive-controllers
    resources:
      requests:
        memory: 256Mi
  - deploymentName: hive-clustersync
    resources:
      requests:
        cpu: 30m
        memory: 257Mi
      limits:
        cpu: 50m
  - deploymentName: hiveadmission
    resources:
      requests:
        cpu: 20m

For each entry, the deploymentName must match the metadata.name of the Deployment/StatefulSet. The resources is a standard corev1.ResourceRequirements.

See below for information on horizontally scaling the clustersync or machinepool controller.

Note: The hive-operator itself must be scaled by directly editing its Deployment.

SyncSet

Hive offers two CRDs for applying configuration in a cluster once it is installed: SyncSet for config destined for specific clusters in a specific namespace, and SelectorSyncSet for config destined for any cluster matching a label selector.

For more information please see the SyncSet documentation.

Scaling ClusterSync and MachinePool

The clustersync and machinepool controllers are designed to scale horizontally, so increasing the number of controller replicas will scale the number of pods running, thereby increasing the number of simultaneous clusters getting syncsets or machinepools applied to them.

In order to scale these controllers, a section like the following should be added to HiveConfig:

spec:
  controllersConfig:
    controllers:
    - config:
        replicas: 3
      name: clustersync

The above example scales the clustersync controller. Use a (separate) section with name: machinepool to scale the machinepool controller.

Identity Provider Management

Hive offers explicit API support for configuring identity providers in the OpenShift clusters it provisions. This is technically powered by the above SyncSet mechanism, but is provided directly in the API to support configuring per cluster identity providers, merged with global identity providers, all of which must land in the same object in the cluster.

For more information please see the SyncIdentityProvider documentation.

Cluster Deprovisioning

oc delete clusterdeployment ${CLUSTER_NAME} --wait=false

Deleting a ClusterDeployment will create a ClusterDeprovision resource, which in turn will launch a pod to attempt to delete all cloud resources created for and by the cluster. This is done by scanning the cloud provider for resources tagged with the cluster's generated InfraID. (i.e. kubernetes.io/cluster/mycluster-fcp4z=owned or sigs.k8s.io/cluster-api-provider-aws/cluster/mycluster-fcp4z=owned) Once all resources have been deleted the pod will terminate, finalizers will be removed, and the ClusterDeployment and dependent objects will be removed. The deprovision process is powered by vendoring the same code from the OpenShift installer used for openshift-install destroy cluster.

FilesExpand file tree

using-hive.md

Latest commit

History

using-hive.md

File metadata and controls

Using Hive

Cluster Provisioning

DNS

Native

Managed DNS

Non-native

Pull Secret

OpenShift Version

Cloud credentials

AWS

Azure

GCP

IBM Cloud

IBM Cloud Credential Manifests

vSphere

OpenStack

Nutanix

Required Credentials

Creating a Secret for Credentials

Additional Required Secrets

Secret for OpenShift Machine API

Secret for OpenShift Cloud Controller Manager

Why These Secrets Are Required

Using the Secret in Hive

Additional Considerations

TLS and Certificate Trust Configuration

No Need to Specify Credentials in Install Config

SSH Key Pair

InstallConfig

Required Secrets

Additional Considerations

ClusterDeployment

Machine Pools

Configuring Availability Zones

Auto-scaling

Integration with Horizontal Pod Autoscalers

Create Cluster on Bare Metal

Monitor the Install Job

Saving Logs for Failed Provisions

Cluster Admin Kubeconfig

Access the Web Console

Managed DNS

Cluster Adoption

Example Adoption ClusterDeployment

Adopting with hiveutil

Transferring ownership

Configuration Management

Vertical Scaling

SyncSet

Scaling ClusterSync and MachinePool

Identity Provider Management

Cluster Deprovisioning