Bug 1820219

Summary: libvirt: Bootstrap node seeing OOMs during OCP installation process
Product: OpenShift Container Platform Reporter: Prashanth Sundararaman <psundara>
Component: InstallerAssignee: Prashanth Sundararaman <psundara>
Installer sub component: openshift-installer QA Contact: Barry Donahue <bdonahue>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 4.4   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1820222 1821788 (view as bug list) Environment:
Last Closed: 2020-05-04 11:48:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1821788    
Bug Blocks: 1820222    

Description Prashanth Sundararaman 2020-04-02 14:14:47 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36
Build Identifier: 

On ppc64le libvirt installation, seeing OOM errors when bootkube.sh runs and noticed that the node is running low on memory. Bumping up the memory fixes the issue. On talking to the libvirt team, they were ok with bumping the bootstrap memory to 5G which is what the worker nodes' default is set to today.

Reproducible: Always

Steps to Reproduce:
1.build installer for ppc64le on libvirt
2.execute a create cluster
3.monitor bootstrap

Comment 1 Prashanth Sundararaman 2020-04-02 14:59:57 UTC
journalctl shows this during the bootstrap process:

Apr 02 14:50:35 test1-24bm8-bootstrap podman[7535]: 2020-04-02 14:50:35.255535222 +0000 UTC m=+3.765924003 container attach ba877f6ca9624bba0440a551705ae57c36f5623c6ed11a61b1a7b121cf5fd682 (image=registry.svc.ci.openshift.org/ocp-ppc64le/release-ppc64le@sha256:46b598ee4a6405d187610160c6874bea93bee36161f466b29c44d374f1745992, name=angry_meitner)
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: hyperkube invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=-999
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: hyperkube cpuset=/ mems_allowed=0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: CPU: 1 PID: 1888 Comm: hyperkube Not tainted 4.18.0-147.5.1.el8_1.ppc64le #1
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Call Trace:
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7680] [c000000000d1e21c] dump_stack+0xb0/0xf4 (unreliable)
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a76c0] [c00000000039e6c0] dump_header+0x80/0x390
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7780] [c00000000039eb0c] oom_kill_process+0x13c/0x210
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a77c0] [c00000000039fce0] out_of_memory+0x200/0x7f0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7860] [c0000000003af03c] __alloc_pages_nodemask+0x10ac/0x1150
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7a50] [c00000000045c858] alloc_pages_current+0xb8/0x220
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7a90] [c000000000399fac] filemap_fault+0x62c/0x990
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7b10] [c008000001860598] __xfs_filemap_fault+0xf0/0x330 [xfs]
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7b70] [c00000000040c294] __do_fault+0x44/0x1c0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7bb0] [c0000000004149d8] do_fault+0x238/0x950
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7c00] [c000000000417e64] __handle_mm_fault+0x344/0xd70
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7ce0] [c0000000004189d0] handle_mm_fault+0x140/0x340
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7d20] [c00000000007aa54] __do_page_fault+0x244/0xc20
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7df0] [c00000000007b468] do_page_fault+0x38/0xd0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7e30] [c00000000000a704] handle_page_fault+0x18/0x38
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Mem-Info:
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: active_anon:23667 inactive_anon:134 isolated_anon:0
                                               active_file:8 inactive_file:0 isolated_file:0
                                               unevictable:0 dirty:0 writeback:0 unstable:0
                                               slab_reclaimable:1242 slab_unreclaimable:4611
                                               mapped:23 shmem:329 pagetables:62 bounce:0
                                               free:1542 free_pcp:0 free_cma:0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Node 0 active_anon:1514688kB inactive_anon:8576kB active_file:512kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1472kB dirty:0kB writeback:0kB shmem:21056kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

Comment 2 Scott Dodson 2020-04-06 19:46:19 UTC
No linked PR, back to NEW.

Comment 5 errata-xmlrpc 2020-05-04 11:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581