Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2002374

Summary: Inexplicably slow kubelet on bootstrap makes installation fail
Product: OpenShift Container Platform Reporter: Pablo Alonso Rodriguez <palonsor>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: Jianli Wei <jiwei>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, bgilbert, esimard, joboyer, ltitov, openshift-bugs-escalate, padillon, scuppett, vwalek, walters, wking
Version: 4.6Keywords: Reopened
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2004716 2027414 (view as bug list) Environment:
Last Closed: 2021-11-29 15:05:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1981999    
Bug Blocks: 2004716, 2027414    

Description Pablo Alonso Rodriguez 2021-09-08 16:14:29 UTC
Description of problem:

In one customer, whenever an installation is tried, the kubelet is inexplicably slow, so it doesn't start the kube-apiserver even after waiting hours. 

As per crio, it doesn't seem to even try to start it, but I cannot point any failure log. 

Sar metrics were also collected and there was no apparent resource exhaustion (either at CPU, RAM, storage, network, no high load...).

So I am going to need kubelet team help to try to understand where can slowness come from and whether it can be due to a kubelet bug.

Version-Release number of selected component (if applicable):

4.6 (different erratas)

How reproducible:

Only at a concrete environment.

Steps to Reproduce:
1. Install a cluster


Actual results:

Bootstrap kube-apiserver pod never starts due to apparent kubelet slowness

Expected results:

kube-apiserver pod starting.

Additional info:

Comment 14 Benjamin Gilbert 2021-09-17 18:41:56 UTC
Moving to POST because bug 1978268 has landed in a build, and we're just waiting for the bootimage bump.

Comment 15 Benjamin Gilbert 2021-09-22 21:47:43 UTC
The bootimage bump in bug 1981999 has landed.  Moving to MODIFIED.

Comment 16 Scott Dodson 2021-09-23 13:48:38 UTC
This made it into 4.9.0-rc.3, moving ON_QA

Comment 17 Gaoyun Pei 2021-09-25 09:21:16 UTC
In payload quay.io/openshift-release-dev/ocp-release:4.9.0-rc.3-x86_64, RHCOS-49.84.202109172039-0 was used as boot image.

[root@ip-10-0-13-79 ~]# rpm-ostree status
State: idle
Deployments:
* ostree://67a210b2d0d1c3787f813061995783c3528d132cfb97bd44b3eb003fb8dacde8
                   Version: 49.84.202109172039-0 (2021-09-17T20:43:24Z)

In QE's CI test, we didn't see bootstrap failure with 4.9.0-rc.3-x86_64, move this bug as verified.

Comment 20 errata-xmlrpc 2021-10-18 17:51:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759