Bug 2004777

Summary: [4.6.z] Inexplicably slow kubelet on bootstrap makes installation fail
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: Jianli Wei <jiwei>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, aos-install, bgilbert, esimard, gpei, jiwei, joboyer, ltitov, openshift-bugs-escalate, padillon, palonsor, walters, wking
Version: 4.6   
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2004717 Environment:
Last Closed: 2021-10-13 07:30:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1964751, 2004717    
Bug Blocks:    

Comment 1 Benjamin Gilbert 2021-09-17 18:36:16 UTC
Moving to POST because bug 1983129 has landed in a build, and we're just waiting for the bootimage bump.

Comment 2 RHCOS Bug Bot 2021-10-04 15:59:41 UTC
The fix for this bug has landed in a bootimage bump, as tracked in bug 1964751 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 5 Jianli Wei 2021-10-09 08:29:00 UTC
In payload registry.ci.openshift.org/ocp/release:4.6.0-0.nightly-2021-10-08-000856, RHCOS-46.82.202110072157-0 was used as boot image.

[fedora@preserve-jiwei ocp46]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-10-08-000856   True        False         95s     Cluster version is 4.6.0-0.nightly-2021-10-08-000856
[fedora@preserve-jiwei ocp46]$ oc get nodes -o wide
NAME                                                    STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
jiwei-cc-bxj7b-master-0.c.openshift-qe.internal         Ready    master   30m   v1.19.14+fcff70a   10.0.0.3      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110072157-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
jiwei-cc-bxj7b-master-1.c.openshift-qe.internal         Ready    master   30m   v1.19.14+fcff70a   10.0.0.5      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110072157-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
jiwei-cc-bxj7b-master-2.c.openshift-qe.internal         Ready    master   30m   v1.19.14+fcff70a   10.0.0.4      <none>        Red Hat Enterprise Linux CoreOS 46.82.202110072157-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
jiwei-cc-bxj7b-worker-a-rm6w7.c.openshift-qe.internal   Ready    worker   21m   v1.19.14+fcff70a   10.0.32.2     <none>        Red Hat Enterprise Linux CoreOS 46.82.202110072157-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
jiwei-cc-bxj7b-worker-b-tt6td.c.openshift-qe.internal   Ready    worker   21m   v1.19.14+fcff70a   10.0.32.3     <none>        Red Hat Enterprise Linux CoreOS 46.82.202110072157-0 (Ootpa)   4.18.0-193.65.2.el8_2.x86_64   cri-o://1.19.4-3.rhaos4.6.git7d25e5d.el8
[fedora@preserve-jiwei ocp46]$ 
[fedora@preserve-jiwei ocp46]$ oc debug node/jiwei-cc-bxj7b-master-0.c.openshift-qe.internal
Starting pod/jiwei-cc-bxj7b-master-0copenshift-qeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1c2c76ba9466e732d363b5df1135ee568538f16aff9e3faa83e0e1540e569b75
              CustomOrigin: Managed by machine-config-operator
                   Version: 46.82.202110072157-0 (2021-10-07T22:01:06Z)

  ostree://f85d8febca4775c5a0a59defdac14b06e80dcdb2e5bfe03d30b4d1f2bd35805e
                   Version: 46.82.202109242004-0 (2021-09-24T20:07:59Z)
sh-4.4# 

In QE's CI test, we didn't see bootstrap failure with 4.6.0-0.nightly-2021-10-..., move this bug as verified.

Comment 7 errata-xmlrpc 2021-10-13 07:30:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.47 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3737