Bug 1927042

Summary: Empty static pod files on UPI deployments are confusing
Product: OpenShift Container Platform Reporter: Ben Nemec <bnemec>
Component: Machine Config OperatorAssignee: Ben Nemec <bnemec>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.5CC: rioliu
Target Milestone: ---Keywords: Triaged
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Empty manifests written to /etc/kubernetes/manifests Consequence: Errors in the kubelet log. These errors were harmless, but confusing. Fix: Instead of writing empty manifests, move the manifests to a different location when they are not needed. Result: No errors in kubelet log.
Story Points: ---
Clone Of:
: 2008692 (view as bug list) Environment:
Last Closed: 2021-07-27 22:43:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2008692    

Description Ben Nemec 2021-02-09 22:12:43 UTC
Description of problem: In the on-prem static pod templates, we have a conditional to prevent those services from being deployed on UPI where they are not used. However, this means that UPI deployments will have errors in their kubelet logs because of the empty static pod file. While this is does not cause problems in and of itself, it is misleading and has resulted in wasted time debugging the "problem". We need to find some way to avoid creating these empty files on UPI deployments.


Version-Release number of selected component (if applicable): 4.5


How reproducible: Always


Steps to Reproduce:
1. Deploy a UPI on-prem platform. VSphere seems to be where most are running into this.

Actual results: Empty static pod files in /etc/kubernetes/manifests and errors about that in the logs.


Expected results: No errors in the logs.


Additional info: I don't believe we can just move the conditional to the entire file in MCO. As I recall, MCO didn't like it when templates were completely empty. We might be able to conditionally move the templates somewhere harmless (like /dev/null) for platforms that don't need them though.

Comment 2 Michael Nguyen 2021-06-10 18:07:48 UTC
@bnemec

I usually don't verify these but I had access to a UPI Vsphere cluster.  Can you confirm this is what is supposed to happen?  All of the static pod templates were moved under /etc/kubernetes/disabled-manifests/.  I will close as verify when you confirm.


$ oc get nodes
NAME              STATUS   ROLES    AGE   VERSION
compute-0         Ready    worker   66m   v1.21.0-rc.0+c5e3b15
compute-1         Ready    worker   66m   v1.21.0-rc.0+c5e3b15
control-plane-0   Ready    master   72m   v1.21.0-rc.0+c5e3b15
control-plane-1   Ready    master   72m   v1.21.0-rc.0+c5e3b15
control-plane-2   Ready    master   72m   v1.21.0-rc.0+c5e3b15
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-10-071057   True        False         57m     Cluster version is 4.8.0-0.nightly-2021-06-10-071057
$ oc debug node/control-plane-0
Starting pod/control-plane-0-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/kubernetes/
sh-4.4# ls
apiserver-url.env  cloud.conf  disabled-manifests  kubelet-ca.crt   kubelet.conf  static-pod-resources
ca.crt		   cni	       kubeconfig	   kubelet-plugins  manifests
sh-4.4# cd disabled-manifests/
sh-4.4# ls
coredns.yaml  haproxy.yaml  keepalived.yaml
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 3 Ben Nemec 2021-06-10 19:30:05 UTC
Yep, that's what we expect to happen after the fix.

Comment 4 Michael Nguyen 2021-06-10 19:35:47 UTC
Closing as verified on 4.8.0-0.nightly-2021-06-10-071057.

Comment 7 errata-xmlrpc 2021-07-27 22:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438