-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Issue template is used for reporting defects or support issues.
There is no POD on the "FIRST" master node due to lack of toleration specification
Detailed Description
i am experiencing this issue that seems to occur specificly in a multi-master environment.
I deployed a multi-master cluster with 3 master nodes and 4 worker nodes.
2 VMs using haproxy+keepalived share a VIP address to redirect all kubernetes API calls traffic to one of the 3 master nodes behind the haproxy nodes.
During the deployment of the kubernetes cluster, it's important to note it was done on 3 phases:
- The first "master" node on which the "kubeadm init" command is typed
- The 2 other "master" nodes on which "kubeadm join --control-plane" command is typed
- The 4 worker nodes on which "kubeadm join" command is typed
During installation of trousseau, when applying daemonset manifest i noticed i had only 2 PODS deploying instead of 3.
After some investigations, it seems the FIRST master (the one on which "kubeadm init" was typed) have an additional taint "" that the 2 other control-plane nodes (those on which "kubeadm join --control-plane" was typed) don't have. It seaems that with this additional taint, the "FIRST" master doesn't match the daemonset tolerations criterions and no POD is deployed on this node.
There are 3 problems ordered by gravity with this:
- the "FIRST" master node without any kms-vault pod running on it is not able to encrypt secrets using the KMS ==> only 2/3 nodes can encrypt secrets: this gives a false "secured" feeling.
- the "FIRST" master node without any kms-vault pod running on it is not able to decrypt secrets encrypted by other nodes ==> Throws errors if api calls is grabbed by the node without kms-vault pod running
- the kube-api server crash after a certain time due to no possible communication with the local kms pod using the socket file ==> The kube-api crash after a lots of errors (see below).
Expected Behavior
A POD is depoyed on every master/control-plane nodes.
Current Behavior
A POD is not depoyed on the "FIRST" master/control-plane node.
Steps to Reproduce
- Deploy a first master node using "kubeadm init" command
- Deploy additional master nodes using "kubeadm join" command with "--control-plane" flag
- Deploy worker nodes "kubeadm join"
- Deploy Vault and Trousseeau using the methode described in https://github.com/ondat/trousseau/wiki/Trousseau-Deployment#
Context (Environment)
Possible Solution/Implementation
Add an additional toleration:
sysadmin@nisdevkubeadmin01:~/trousseau$ diff trousseau-daemonset.yaml trousseau-daemonset.yaml.ORIGINAL
112,114d111
< - key: node-role.kubernetes.io/master
< operator: Exists
< effect: NoSchedule
Apply it:
sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl apply -f trousseau-daemonset.yaml
daemonset.apps/vault-kms-provider createdThe daemon set now has a value of 3:
sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl describe daemonset.apps/vault-kms-provider -n kube-system
Name: vault-kms-provider
Selector: name=vault-kms-provider
Node-Selector: <none>
Labels: app=vault-kms-provider
tier=control-plane
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
[...etc...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate <invalid> daemonset-controller Created pod: vault-kms-provider-qnkbc
Normal SuccessfulCreate <invalid> daemonset-controller Created pod: vault-kms-provider-klsmf
Normal SuccessfulCreate <invalid> daemonset-controller Created pod: vault-kms-provider-rwl8s
sysadmin@nisdevkubeadmin01:~/trousseau$ kubectl get daemonset/vault-kms-provider -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
vault-kms-provider 3 3 3 3 3 <none> 19s