Bug 1896226

Summary: recycler-pod template should not be in kubelet static manifests directory
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: StorageAssignee: Seth Jennings <sjenning>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, fbertina, jerzhang, jsafrane, pehunt, rphillips
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Removes a misplaced recycler pod template from the kubelet static pod manifests directory. This resulted in kubelet log messages indicating failure to start the recycler static pod.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:34:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932860    

Description Seth Jennings 2020-11-10 04:38:33 UTC
Description of problem:

https://github.com/openshift/machine-config-operator/pull/1687

Introduced a recycler-pod template but placed it in the kubelet's static manifests directory.

It is trying to run it once per second

Nov 09 09:55:44 master-0.ocp.variantweb.net hyperkube[1944]: I1109 09:55:44.815976    1944 kubelet.go:1891] SyncLoop (SYNC): 2 pods; recyler-pod-master-0.ocp.variantweb.net_openshift-infra(1974bdf306c9e1ace3cc502fe8a2f041), sdn-q9qqv_openshift-sdn(45f227b2-1dc4-4547-8198-a40a8b8ca516)
Nov 09 09:55:44 master-0.ocp.variantweb.net hyperkube[1944]: I1109 09:55:44.816022    1944 kubelet.go:1936] Pod "recyler-pod-master-0.ocp.variantweb.net_openshift-infra(1974bdf306c9e1ace3cc502fe8a2f041)" has completed, ignoring remaining sync work: sync
Nov 09 09:55:45 master-0.ocp.variantweb.net hyperkube[1944]: I1109 09:55:45.815114    1944 kubelet.go:1891] SyncLoop (SYNC): 2 pods; ovs-ktg7f_openshift-sdn(6e2cd68b-b8d0-49ca-ae20-340c0578407c), recyler-pod-master-0.ocp.variantweb.net_openshift-infra(1974bdf306c9e1ace3cc502fe8a2f041)
Nov 09 09:55:45 master-0.ocp.variantweb.net hyperkube[1944]: I1109 09:55:45.815171    1944 kubelet.go:1936] Pod "recyler-pod-master-0.ocp.variantweb.net_openshift-infra(1974bdf306c9e1ace3cc502fe8a2f041)" has completed, ignoring remaining sync work: sync

# journalctl -u kubelet --since="1 hour ago" | grep recyler-pod | wc -l
7266

Version-Release number of selected component (if applicable):
4.6.3

How reproducible:
Always on masters

Steps to Reproduce:
1. Install a cluster
2.
3.

Actual results:
Master kubelet logs fill with sync's for the recycler pod

Expected results:
The recycler-pod template is not in the kubelet static manifests directory

Additional info:

Comment 1 Seth Jennings 2020-11-18 19:52:21 UTC
The way I see it, this will be a 3 step fix:

1. Move location of the recycler pod in MCO
2. Change KCM to use new location
3. Project empty file at old location (is there a way to remove a previously projected file?) so the kubelet doesn't try to start it all the time

Comment 2 Peter Hunt 2020-11-18 19:55:58 UTC
it's possible it's also continuously running because it's never actually being created, because of https://github.com/openshift/machine-config-operator/pull/2215

Comment 3 Peter Hunt 2020-11-18 19:56:26 UTC
which is more reason to not have it a static pod

Comment 4 Seth Jennings 2020-11-18 19:59:09 UTC
first step PR
https://github.com/openshift/machine-config-operator/pull/2238

Comment 6 Yu Qi Zhang 2020-12-08 17:24:54 UTC
Assigning to Seth as he is working on the PR, also moving over to storage board as that was the original component for the recycler pod as per bug 1805908.

Note also that the original bug was cherry picked to 4.4 (but not 4.5?) so maybe there is a need for backport.

Comment 7 Seth Jennings 2020-12-08 17:34:32 UTC
I'm still trying to figure out how this can be done in a backward compatible way.

Comment 8 Fabio Bertinatto 2020-12-11 14:21:55 UTC
(In reply to Seth Jennings from comment #7)
> I'm still trying to figure out how this can be done in a backward compatible
> way.

The idea I have to solve this is to move the rendering of template to KCM operator instead. Currently that's done in MCO, but that can be problematic because KCM operator can start before the template is rendered.

Comment 11 Wei Duan 2021-03-05 10:08:01 UTC
Verified on 4.8.0-0.nightly-2021-03-04-203700.
NFS recycler works well and I changed the status to Verified.

Comment 14 errata-xmlrpc 2021-07-27 22:34:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438