Bug 2043721

Summary: Installer bootstrap hosts using outdated kubelet containing bugs
Product: OpenShift Container Platform Reporter: Devan Goodwin <dgoodwin>
Component: RHCOSAssignee: Micah Abbott <miabbott>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: medium    
Version: 4.10CC: bgilbert, dornelas, jligon, miabbott, mrussell, mstaeble, nstielau, wking
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:41:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2043297    
Bug Blocks:    

Description Devan Goodwin 2022-01-21 20:24:21 UTC
In roughly 4-8% of CI runs we believe etcd is failing to come up. The problem appears to have been traced back to a race condition which was thought to be fixed, but it appears bootstrap is using an rhcos image with outdated kubelet and the bug is still present.

Discussion mostly in this thread: https://coreos.slack.com/archives/C01CQA76KMX/p1642786244483900

But the key conclusion was:

4.10 installer is using for bootstrap
Red Hat Enterprise Linux CoreOS 410.84.202112040202-0
which has the wrong kubelet
[core@test1-k59fj-bootstrap ~]$ sudo kubelet --version
Kubernetes v1.22.1+6859754

Suspected out of date metadata in https://github.com/openshift/installer/blob/release-4.10/data/data/coreos/rhcos.json ?

Comment 2 RHCOS Bug Bot 2022-01-22 01:32:40 UTC
This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 3 Michael Nguyen 2022-01-24 18:09:57 UTC
Preverified on RHCOS 410.84.202201241447-0

[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
● ostree://064c92e49da0e5dd9dbc5ca8be7495ffb3f703e0f2c55c5b7a59d17d19d35a2b
                   Version: 410.84.202201241447-0 (2022-01-24T14:51:10Z)
[core@cosa-devsh ~]$ rpm -qa | grep kube
openshift-hyperkube-4.10.0-202201230027.p0.g06791f6.assembly.stream.el8.x86_64
[core@cosa-devsh ~]$ kubelet --version
Kubernetes v1.23.0+06791f6

Comment 4 RHCOS Bug Bot 2022-01-28 16:09:27 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 2043297 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 6 Michael Nguyen 2022-02-02 15:39:35 UTC
Verified on 4.10.0-0.nightly-2022-02-02-000921

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-02-000921   True        False         3m50s   Cluster version is 4.10.0-0.nightly-2022-02-02-000921
$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-b7fit7k-72292-4wkqh-master-0         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-master-1         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-master-2         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9   Ready    worker   16m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-b-fvvls   Ready    worker   14m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-c-jnrnq   Ready    worker   14m   v1.23.3+b63be7f
$ oc debug node/ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9
Starting pod/ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# kubelet --version
Kubernetes v1.23.3+b63be7f

Comment 9 errata-xmlrpc 2022-03-10 16:41:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056