Bug 1820219 - libvirt: Bootstrap node seeing OOMs during OCP installation process
Summary: libvirt: Bootstrap node seeing OOMs during OCP installation process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: ppc64le
OS: Linux
medium
medium
Target Milestone: ---
: 4.4.0
Assignee: Prashanth Sundararaman
QA Contact: Barry Donahue
URL:
Whiteboard:
Depends On: 1821788
Blocks: 1820222
TreeView+ depends on / blocked
 
Reported: 2020-04-02 14:14 UTC by Prashanth Sundararaman
Modified: 2020-05-04 11:48 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1820222 1821788 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:48:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 3426 None closed Bug 1820219: libvirt: Bump bootstrap memory to 5G for ppc64le 2020-04-24 20:33:31 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:48:21 UTC

Description Prashanth Sundararaman 2020-04-02 14:14:47 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36
Build Identifier: 

On ppc64le libvirt installation, seeing OOM errors when bootkube.sh runs and noticed that the node is running low on memory. Bumping up the memory fixes the issue. On talking to the libvirt team, they were ok with bumping the bootstrap memory to 5G which is what the worker nodes' default is set to today.

Reproducible: Always

Steps to Reproduce:
1.build installer for ppc64le on libvirt
2.execute a create cluster
3.monitor bootstrap

Comment 1 Prashanth Sundararaman 2020-04-02 14:59:57 UTC
journalctl shows this during the bootstrap process:

Apr 02 14:50:35 test1-24bm8-bootstrap podman[7535]: 2020-04-02 14:50:35.255535222 +0000 UTC m=+3.765924003 container attach ba877f6ca9624bba0440a551705ae57c36f5623c6ed11a61b1a7b121cf5fd682 (image=registry.svc.ci.openshift.org/ocp-ppc64le/release-ppc64le@sha256:46b598ee4a6405d187610160c6874bea93bee36161f466b29c44d374f1745992, name=angry_meitner)
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: hyperkube invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=-999
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: hyperkube cpuset=/ mems_allowed=0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: CPU: 1 PID: 1888 Comm: hyperkube Not tainted 4.18.0-147.5.1.el8_1.ppc64le #1
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Call Trace:
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7680] [c000000000d1e21c] dump_stack+0xb0/0xf4 (unreliable)
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a76c0] [c00000000039e6c0] dump_header+0x80/0x390
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7780] [c00000000039eb0c] oom_kill_process+0x13c/0x210
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a77c0] [c00000000039fce0] out_of_memory+0x200/0x7f0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7860] [c0000000003af03c] __alloc_pages_nodemask+0x10ac/0x1150
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7a50] [c00000000045c858] alloc_pages_current+0xb8/0x220
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7a90] [c000000000399fac] filemap_fault+0x62c/0x990
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7b10] [c008000001860598] __xfs_filemap_fault+0xf0/0x330 [xfs]
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7b70] [c00000000040c294] __do_fault+0x44/0x1c0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7bb0] [c0000000004149d8] do_fault+0x238/0x950
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7c00] [c000000000417e64] __handle_mm_fault+0x344/0xd70
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7ce0] [c0000000004189d0] handle_mm_fault+0x140/0x340
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7d20] [c00000000007aa54] __do_page_fault+0x244/0xc20
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7df0] [c00000000007b468] do_page_fault+0x38/0xd0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: [c0000000556a7e30] [c00000000000a704] handle_page_fault+0x18/0x38
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Mem-Info:
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: active_anon:23667 inactive_anon:134 isolated_anon:0
                                               active_file:8 inactive_file:0 isolated_file:0
                                               unevictable:0 dirty:0 writeback:0 unstable:0
                                               slab_reclaimable:1242 slab_unreclaimable:4611
                                               mapped:23 shmem:329 pagetables:62 bounce:0
                                               free:1542 free_pcp:0 free_cma:0
Apr 02 14:50:36 test1-24bm8-bootstrap kernel: Node 0 active_anon:1514688kB inactive_anon:8576kB active_file:512kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1472kB dirty:0kB writeback:0kB shmem:21056kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

Comment 2 Scott Dodson 2020-04-06 19:46:19 UTC
No linked PR, back to NEW.

Comment 5 errata-xmlrpc 2020-05-04 11:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.