2043721 – Installer bootstrap hosts using outdated kubelet containing bugs

Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs

Summary: Installer bootstrap hosts using outdated kubelet containing bugs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Micah Abbott
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	2043297
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-21 20:24 UTC by Devan Goodwin
Modified:	2022-03-10 16:42 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:41:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:42:03 UTC

Description Devan Goodwin 2022-01-21 20:24:21 UTC

In roughly 4-8% of CI runs we believe etcd is failing to come up. The problem appears to have been traced back to a race condition which was thought to be fixed, but it appears bootstrap is using an rhcos image with outdated kubelet and the bug is still present.

Discussion mostly in this thread: https://coreos.slack.com/archives/C01CQA76KMX/p1642786244483900

But the key conclusion was:

4.10 installer is using for bootstrap
Red Hat Enterprise Linux CoreOS 410.84.202112040202-0
which has the wrong kubelet
[core@test1-k59fj-bootstrap ~]$ sudo kubelet --version
Kubernetes v1.22.1+6859754

Suspected out of date metadata in https://github.com/openshift/installer/blob/release-4.10/data/data/coreos/rhcos.json ?

Comment 2 RHCOS Bug Bot 2022-01-22 01:32:40 UTC

This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 3 Michael Nguyen 2022-01-24 18:09:57 UTC

Preverified on RHCOS 410.84.202201241447-0

[core@cosa-devsh ~]$ rpm-ostree status
State: idle
Deployments:
● ostree://064c92e49da0e5dd9dbc5ca8be7495ffb3f703e0f2c55c5b7a59d17d19d35a2b
                   Version: 410.84.202201241447-0 (2022-01-24T14:51:10Z)
[core@cosa-devsh ~]$ rpm -qa | grep kube
openshift-hyperkube-4.10.0-202201230027.p0.g06791f6.assembly.stream.el8.x86_64
[core@cosa-devsh ~]$ kubelet --version
Kubernetes v1.23.0+06791f6

Comment 4 RHCOS Bug Bot 2022-01-28 16:09:27 UTC

The fix for this bug has landed in a bootimage bump, as tracked in bug 2043297 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 6 Michael Nguyen 2022-02-02 15:39:35 UTC

Verified on 4.10.0-0.nightly-2022-02-02-000921

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-02-02-000921   True        False         3m50s   Cluster version is 4.10.0-0.nightly-2022-02-02-000921
$ oc get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ci-ln-b7fit7k-72292-4wkqh-master-0         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-master-1         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-master-2         Ready    master   23m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9   Ready    worker   16m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-b-fvvls   Ready    worker   14m   v1.23.3+b63be7f
ci-ln-b7fit7k-72292-4wkqh-worker-c-jnrnq   Ready    worker   14m   v1.23.3+b63be7f
$ oc debug node/ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9
Starting pod/ci-ln-b7fit7k-72292-4wkqh-worker-a-82gp9-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# kubelet --version
Kubernetes v1.23.3+b63be7f

Comment 9 errata-xmlrpc 2022-03-10 16:41:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.