Bug 1868773

Summary: [libvirt]: OCP 4.6 installation fails due to OOM in the bootstrap node
Product: OpenShift Container Platform Reporter: Prashanth Sundararaman <psundara>
Component: Multi-ArchAssignee: Prashanth Sundararaman <psundara>
Status: CLOSED ERRATA QA Contact: Jeremy Poulin <jpoulin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: adahiya, aos-bugs, danili, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:28:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journal logs
none
top output none

Description Prashanth Sundararaman 2020-08-13 19:09:24 UTC
Description of problem:
I have noticed with 4.6 that libvirt installations consistently fail because bootkube never finishes. This is because the OOM killer kicks in during the bootkube process. On examining the top command the kube-apiserver process takes 65% of memory.

The memory size for the bootstrap node is 2G and this was sufficient till now and even with 4.5 deploys I see that we do come pretty close to hitting the limit but looks like it is just enough. In 4.6 looks like the memory consumption of kube-apiserver increased just a little to push it over the edge. 

This affects the e2e-libvirt CI job and multi-arch CI jobs which run on libvirt.

It would be pretty simple to just bump the memory for the bootstrap node, but we first want to make sure this is not an issue.

Version-Release number of selected component (if applicable):
4.6

How reproducible:
Always on libvirt deploys

Comment 1 Prashanth Sundararaman 2020-08-13 19:33:06 UTC
Created attachment 1711375 [details]
journal logs

Comment 2 Prashanth Sundararaman 2020-08-13 19:33:32 UTC
Created attachment 1711376 [details]
top output

Comment 7 errata-xmlrpc 2020-10-27 16:28:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196