Skip to content

[DEFECT] Missing POD in multi-master context for first master node #245

@ghost

Description

Issue template is used for reporting defects or support issues.

There is no POD on the "FIRST" master node due to lack of toleration specification

Detailed Description

i am experiencing this issue that seems to occur specificly in a multi-master environment.

I deployed a multi-master cluster with 3 master nodes and 4 worker nodes.
2 VMs using haproxy+keepalived share a VIP address to redirect all kubernetes API calls traffic to one of the 3 master nodes behind the haproxy nodes.

During the deployment of the kubernetes cluster, it's important to note it was done on 3 phases:

  • The first "master" node on which the "kubeadm init" command is typed
  • The 2 other "master" nodes on which "kubeadm join --control-plane" command is typed
  • The 4 worker nodes on which "kubeadm join" command is typed

During installation of trousseau, when applying daemonset manifest i noticed i had only 2 PODS deploying instead of 3.

After some investigations, it seems the FIRST master (the one on which "kubeadm init" was typed) have an additional taint "" that the 2 other control-plane nodes (those on which "kubeadm join --control-plane" was typed) don't have. It seaems that with this additional taint, the "FIRST" master doesn't match the daemonset tolerations criterions and no POD is deployed on this node.

There are 3 problems ordered by gravity with this:

  • the "FIRST" master node without any kms-vault pod running on it is not able to encrypt secrets using the KMS ==> only 2/3 nodes can encrypt secrets: this gives a false "secured" feeling.
  • the "FIRST" master node without any kms-vault pod running on it is not able to decrypt secrets encrypted by other nodes ==> Throws errors if api calls is grabbed by the node without kms-vault pod running
  • the kube-api server crash after a certain time due to no possible communication with the local kms pod using the socket file ==> The kube-api crash after a lots of errors (see below).

Expected Behavior

A POD is depoyed on every master/control-plane nodes.

Current Behavior

A POD is not depoyed on the "FIRST" master/control-plane node.

Steps to Reproduce

  1. Deploy a first master node using "kubeadm init" command
  2. Deploy additional master nodes using "kubeadm join" command with "--control-plane" flag
  3. Deploy worker nodes "kubeadm join"
  4. Deploy Vault and Trousseeau using the methode described in https://github.com/ondat/trousseau/wiki/Trousseau-Deployment#

Context (Environment)

Possible Solution/Implementation

Add an additional toleration:

sysadmin@nisdevkubeadmin01:~/trousseau$ diff trousseau-daemonset.yaml trousseau-daemonset.yaml.ORIGINAL
112,114d111
<         - key: node-role.kubernetes.io/master
<           operator: Exists
<           effect: NoSchedule

Apply it:

sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl apply -f trousseau-daemonset.yaml
daemonset.apps/vault-kms-provider created

The daemon set now has a value of 3:

sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl describe daemonset.apps/vault-kms-provider -n kube-system
Name:           vault-kms-provider
Selector:       name=vault-kms-provider
Node-Selector:  <none>
Labels:         app=vault-kms-provider
                tier=control-plane
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status:  3 Running / 0 Waiting / 0 Succeeded / 0 Failed
[...etc...]
Events:
  Type    Reason            Age        From                  Message
  ----    ------            ----       ----                  -------
  Normal  SuccessfulCreate  <invalid>  daemonset-controller  Created pod: vault-kms-provider-qnkbc
  Normal  SuccessfulCreate  <invalid>  daemonset-controller  Created pod: vault-kms-provider-klsmf
  Normal  SuccessfulCreate  <invalid>  daemonset-controller  Created pod: vault-kms-provider-rwl8s

sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl get daemonset/vault-kms-provider -n kube-system
NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
vault-kms-provider   3         3         3       3            3           <none>          19s

Possible PR


Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions