Bug 1927042 - Empty static pod files on UPI deployments are confusing
Summary: Empty static pod files on UPI deployments are confusing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Ben Nemec
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 2008692
TreeView+ depends on / blocked
 
Reported: 2021-02-09 22:12 UTC by Ben Nemec
Modified: 2021-09-28 21:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Empty manifests written to /etc/kubernetes/manifests Consequence: Errors in the kubelet log. These errors were harmless, but confusing. Fix: Instead of writing empty manifests, move the manifests to a different location when they are not needed. Result: No errors in kubelet log.
Clone Of:
: 2008692 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:43:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2413 0 None open Bug 1927042: [baremetal & friends] Don't write empty static pod manifests 2021-02-16 20:15:19 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:43:32 UTC

Description Ben Nemec 2021-02-09 22:12:43 UTC
Description of problem: In the on-prem static pod templates, we have a conditional to prevent those services from being deployed on UPI where they are not used. However, this means that UPI deployments will have errors in their kubelet logs because of the empty static pod file. While this is does not cause problems in and of itself, it is misleading and has resulted in wasted time debugging the "problem". We need to find some way to avoid creating these empty files on UPI deployments.


Version-Release number of selected component (if applicable): 4.5


How reproducible: Always


Steps to Reproduce:
1. Deploy a UPI on-prem platform. VSphere seems to be where most are running into this.

Actual results: Empty static pod files in /etc/kubernetes/manifests and errors about that in the logs.


Expected results: No errors in the logs.


Additional info: I don't believe we can just move the conditional to the entire file in MCO. As I recall, MCO didn't like it when templates were completely empty. We might be able to conditionally move the templates somewhere harmless (like /dev/null) for platforms that don't need them though.

Comment 2 Michael Nguyen 2021-06-10 18:07:48 UTC
@bnemec@redhat.com

I usually don't verify these but I had access to a UPI Vsphere cluster.  Can you confirm this is what is supposed to happen?  All of the static pod templates were moved under /etc/kubernetes/disabled-manifests/.  I will close as verify when you confirm.


$ oc get nodes
NAME              STATUS   ROLES    AGE   VERSION
compute-0         Ready    worker   66m   v1.21.0-rc.0+c5e3b15
compute-1         Ready    worker   66m   v1.21.0-rc.0+c5e3b15
control-plane-0   Ready    master   72m   v1.21.0-rc.0+c5e3b15
control-plane-1   Ready    master   72m   v1.21.0-rc.0+c5e3b15
control-plane-2   Ready    master   72m   v1.21.0-rc.0+c5e3b15
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-06-10-071057   True        False         57m     Cluster version is 4.8.0-0.nightly-2021-06-10-071057
$ oc debug node/control-plane-0
Starting pod/control-plane-0-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cd /etc/kubernetes/
sh-4.4# ls
apiserver-url.env  cloud.conf  disabled-manifests  kubelet-ca.crt   kubelet.conf  static-pod-resources
ca.crt		   cni	       kubeconfig	   kubelet-plugins  manifests
sh-4.4# cd disabled-manifests/
sh-4.4# ls
coredns.yaml  haproxy.yaml  keepalived.yaml
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 3 Ben Nemec 2021-06-10 19:30:05 UTC
Yep, that's what we expect to happen after the fix.

Comment 4 Michael Nguyen 2021-06-10 19:35:47 UTC
Closing as verified on 4.8.0-0.nightly-2021-06-10-071057.

Comment 7 errata-xmlrpc 2021-07-27 22:43:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.