Description of problem: In the on-prem static pod templates, we have a conditional to prevent those services from being deployed on UPI where they are not used. However, this means that UPI deployments will have errors in their kubelet logs because of the empty static pod file. While this is does not cause problems in and of itself, it is misleading and has resulted in wasted time debugging the "problem". We need to find some way to avoid creating these empty files on UPI deployments.
Version-Release number of selected component (if applicable): 4.5
How reproducible: Always
Steps to Reproduce:
1. Deploy a UPI on-prem platform. VSphere seems to be where most are running into this.
Actual results: Empty static pod files in /etc/kubernetes/manifests and errors about that in the logs.
Expected results: No errors in the logs.
Additional info: I don't believe we can just move the conditional to the entire file in MCO. As I recall, MCO didn't like it when templates were completely empty. We might be able to conditionally move the templates somewhere harmless (like /dev/null) for platforms that don't need them though.
I usually don't verify these but I had access to a UPI Vsphere cluster. Can you confirm this is what is supposed to happen? All of the static pod templates were moved under /etc/kubernetes/disabled-manifests/. I will close as verify when you confirm.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 66m v1.21.0-rc.0+c5e3b15
compute-1 Ready worker 66m v1.21.0-rc.0+c5e3b15
control-plane-0 Ready master 72m v1.21.0-rc.0+c5e3b15
control-plane-1 Ready master 72m v1.21.0-rc.0+c5e3b15
control-plane-2 Ready master 72m v1.21.0-rc.0+c5e3b15
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-06-10-071057 True False 57m Cluster version is 4.8.0-0.nightly-2021-06-10-071057
$ oc debug node/control-plane-0
Starting pod/control-plane-0-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/kubernetes/
apiserver-url.env cloud.conf disabled-manifests kubelet-ca.crt kubelet.conf static-pod-resources
ca.crt cni kubeconfig kubelet-plugins manifests
sh-4.4# cd disabled-manifests/
coredns.yaml haproxy.yaml keepalived.yaml
Removing debug pod ...
Yep, that's what we expect to happen after the fix.
Closing as verified on 4.8.0-0.nightly-2021-06-10-071057.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.