Bug 1856588
Summary: | Guest crashed and hung when hot unplug vcpus | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Xujun Ma <xuma> |
Component: | kernel | Assignee: | David Gibson <dgibson> |
kernel sub component: | ppc64 | QA Contact: | Xujun Ma <xuma> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | bugproxy, dgibson, fnovak, gkurz, hannsj_uhl, jinzhao, juzhang, lmiksik, lvivier, mdeng, mtessun, pasik, pvlasin, qzhang, virt-maint |
Version: | 8.3 | Keywords: | Patch, Triaged |
Target Milestone: | rc | ||
Target Release: | 8.3 | ||
Hardware: | ppc64le | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-4.18.0-236.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-04 01:24:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1733467 | ||
Bug Blocks: | 1776265 |
Description
Xujun Ma
2020-07-14 02:33:28 UTC
Hi Xujun, Can you confirm the regression happens from which version? Thanks, Qunfang I'm unable to reproduce this with kernel-4.18.0-227.el8.ppc64le in both host and guest, and qemu-kvm 5.0, though I haven't identified any related fixes that have gone in. I tried the original workload ~15 times, and then a sped up workload which I ran for an hour which was ~30 iterations of the original test loop. I also tried it with echo 600000 >/proc/sys/vm/percpu_pagelist_fraction in the guest to in hopes that it would trigger the free_pcppages_bulk() call more frequently Can we confirm the 1/5 reproducible of this, or whether it is still reproducible? Are there other factors worth noting that might make a difference with reproducibility? Also, what machine configuration was this run on? I did the tests on a Boston system with 128 CPUs and 64GB of memory (In reply to Michael Roth from comment #2) > I'm unable to reproduce this with kernel-4.18.0-227.el8.ppc64le in both host > and guest, and qemu-kvm 5.0, though I haven't identified any related fixes > that have gone in. > > I tried the original workload ~15 times, and then a sped up workload which I > ran for an hour which was ~30 iterations of the original test loop. > > I also tried it with echo 600000 >/proc/sys/vm/percpu_pagelist_fraction in > the guest to in hopes that it would trigger the free_pcppages_bulk() call > more frequently > > Can we confirm the 1/5 reproducible of this, or whether it is still > reproducible? Are there other factors worth noting that might make a > difference with reproducibility? I'm not sure. From my side, it's easy to reproduce. Did you use threads=1? > > Also, what machine configuration was this run on? I did the tests on a > Boston system with 128 CPUs and 64GB of memory Power9 host. (In reply to Qunfang Zhang from comment #1) > Hi Xujun, > > Can you confirm the regression happens from which version? > > Thanks, > Qunfang I tested the first fast train build and latest slow train build. All can reproduce this problem. qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+1f7d6182 qemu-kvm-4.2.0-30.module+el8.3.0+7298+c26a06b8 The host lockups here look like bug 1733467. I think we'll have to sort out before seeing if there's any remaining bug here. Xujun, Roughly how long did this take to reproduce for you? I'm wondering if we might have a better way of triggering bug 1733467 here. (In reply to David Gibson from comment #7) > Xujun, > > Roughly how long did this take to reproduce for you? > > I'm wondering if we might have a better way of triggering bug 1733467 here. It's easy to reproduce before on boston and withersppon machine, I just did several loops. But I can't reproduce it now on a boston machine. I'm finding what problem might be. (In reply to Xujun Ma from comment #4) > (In reply to Qunfang Zhang from comment #1) > > Hi Xujun, > > > > Can you confirm the regression happens from which version? > > > > Thanks, > > Qunfang > > I tested the first fast train build and latest slow train build. > All can reproduce this problem. > > qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+1f7d6182 > qemu-kvm-4.2.0-30.module+el8.3.0+7298+c26a06b8 can you confirm what kernel was used in this newer case? would help in determining if this might have been fixed downstream or not. I reproduced this bug again with in-kernel type irqchip in guest on a witherspoon host. I think it might be related to irqchip type. I have also reproduced the guest crash on Witherspoon, and the host lockups did *not* appear. So I think the two problems are unrelated, and that host happened to have also hit bug 1733467 triggered by something else on the system. Mike, sounds like this only happens with in-kernel irqchip (which requires Witherspoon), we now have a Witherspoon machine (ibm-p9wr-09.pnr.lab.eng.bos.redhat.com) so can you continue investigating there please. FWIW I ran the workload on 4 guests overnight without in-kernel irqchip using the original 4.18.0-224.el8.ppc64le kernel on a Boston system and did not encounter the issue, so in-kernel irqchip does seem likely to be a requirement. Will attempt to reproduce on a witherspoon. I reproduced a related trace on a mihawk system using kernel 221. will attempt to reproduce with 224 as with original report, but assuming the code hasn't changed much I think the above traces are from the cpu tearing itself down via pseries_mach_cpu_die, whereas the trace below is from a device_offline(cpu) call handled by the hotplug worker thread. The below case calls free_pcppages_bulk() via generic page_alloc_cpu_dead() callback instead of free_pcppages_bulk->free_unref_page->xive_spapr_cleanup_queue The presence of the "cpu 312 (hwid 312) Ready to die" suggests the dying cpu has already finished cleanup, but perhaps it executed free_pcppages_bulk while the hotplug worker was still in the middle of free_pcppages_bulk that's what resulted in the double add? i'm not sure what the locking protocol is there. This doesn't really explain why we don't see it without in-kernel irqchip, but if it's racy then it could just be a timing thing. [ 853.755706] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000137 [ 853.907647] IRQ 19: no longer affine to CPU311 [ 853.907723] cpu 311 (hwid 311) Ready to die... [ 853.965435] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000137 [ 854.771062] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000138 [ 854.879878] Querying DEAD? cpu 312 (312) shows 2 [ 854.879971] list_add double add: new=c00a00000245ee48, prev=c0000009feb5ca80, next=c00a00000245ee48. [ 854.879974] cpu 312 (hwid 312) Ready to die... [ 854.880040] ------------[ cut here ]------------ [ 854.880796] kernel BUG at lib/list_debug.c:31! [ 854.880829] Oops: Exception in kernel mode, sig: 5 [#1] [ 854.880860] LE SMP NR_CPUS=2048 NUMA pSeries [ 854.880892] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nft_chain_route_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [ 854.881254] CPU: 334 PID: 7 Comm: kworker/u768:0 Kdump: loaded Not tainted 4.18.0-193.10.el8.bz1730194.ppc64le #1 [ 854.881318] Workqueue: pseries hotplug workque pseries_hp_work_fn [ 854.881357] NIP: c000000000784b20 LR: c000000000784b1c CTR: c000000007dfb280 [ 854.881402] REGS: c0000009e584b3d0 TRAP: 0700 Not tainted (4.18.0-193.10.el8.bz1730194.ppc64le) [ 854.881455] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48082228 XER: 20040000 [ 854.881510] CFAR: c0000000001f9c24 IRQMASK: 1 [ 854.881510] GPR00: c000000000784b1c c0000009e584b650 c000000001954a00 0000000000000058 [ 854.881510] GPR04: 0000000071000001 0000000000000009 3435656534382e0d 0a61303030303032 [ 854.881510] GPR08: 0000000000000007 0000000000000007 0000000000000001 3562656639303030 [ 854.881510] GPR12: 0000000000002000 c000000007dfb280 c00a00000245ee48 0000000000000000 [ 854.881510] GPR16: c00a00000245ee48 c00a00000245ee40 00000000000917b8 c0000009feb5c980 [ 854.881510] GPR20: 0000000000000000 5deadbeef0000100 c0000009feb5ca80 c0000000019935d0 [ 854.881510] GPR24: c0000009e584b708 c000000001b02f08 fffffffff0000000 0000000000000001 [ 854.881510] GPR28: c00a000002469b00 0000000000000000 c00a00000245ee40 0000000000000100 [ 854.881887] NIP [c000000000784b20] __list_add_valid+0x90/0xc0 [ 854.881926] LR [c000000000784b1c] __list_add_valid+0x8c/0xc0 [ 854.881963] Call Trace: [ 854.881980] [c0000009e584b650] [c000000000784b1c] __list_add_valid+0x8c/0xc0 (unreliable) [ 854.882027] [c0000009e584b6b0] [c00000000045d3c0] free_pcppages_bulk+0x460/0x940 [ 854.882073] [c0000009e584b7b0] [c00000000045dc98] page_alloc_cpu_dead+0x118/0x120 [ 854.882119] [c0000009e584b800] [c000000000154fa8] cpuhp_invoke_callback.constprop.4+0xb8/0x760 [ 854.882173] [c0000009e584b870] [c0000000001585d8] _cpu_down+0x188/0x380 [ 854.882211] [c0000009e584b8e0] [c000000000156e1c] cpu_down+0x5c/0xa0 [ 854.882251] [c0000009e584b910] [c0000000008ff7c4] cpu_subsys_offline+0x24/0x40 [ 854.882296] [c0000009e584b930] [c0000000008f44e0] device_offline+0xf0/0x130 [ 854.882335] [c0000009e584b970] [c0000000001080a4] dlpar_offline_cpu+0x1c4/0x2a0 [ 854.882382] [c0000009e584ba30] [c0000000001084b8] dlpar_cpu_remove+0xb8/0x190 [ 854.882428] [c0000009e584bab0] [c0000000001086bc] dlpar_cpu_remove_by_index+0x12c/0x150 [ 854.882474] [c0000009e584bb40] [c000000000109a74] dlpar_cpu+0x94/0x600 [ 854.882512] [c0000009e584bc00] [c0000000000fffe8] pseries_hp_work_fn+0x128/0x1e0 [ 854.882559] [c0000009e584bc70] [c000000000183ed4] process_one_work+0x304/0x5d0 [ 854.882605] [c0000009e584bd10] [c000000000184a1c] worker_thread+0xcc/0x7a0 [ 854.882644] [c0000009e584bdc0] [c00000000018e9bc] kthread+0x1ac/0x1c0 [ 854.882683] [c0000009e584be30] [c00000000000b7dc] ret_from_kernel_thread+0x5c/0x80 [ 854.882728] Instruction dump: [ 854.882752] 4ba750ed 60000000 0fe00000 7c0802a6 7c641b78 3c62ff74 7d465378 7d054378 [ 854.882799] 3863f528 f8010070 4ba750c5 60000000 <0fe00000> 7c0802a6 3c62ff74 7d264b78 [ 854.882849] ---[ end trace 8fef7a65bb8c7def ]--- [ 854.888093] [ 854.888226] Sending IPI to other CPUs slightly different trace on above mihawk system with originally reported 4.18.0-224.el8 kernel [ 1537.419315] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000139 [ 1538.253044] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a [ 1538.360259] Querying DEAD? cpu 314 (314) shows 2 [ 1538.360736] BUG: Bad page state in process kworker/u768:3 pfn:95de1 [ 1538.360746] cpu 314 (hwid 314) Ready to die... [ 1538.360784] page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 [ 1538.361881] flags: 0x5ffffc00000000() [ 1538.361908] raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000 [ 1538.361955] raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000 [ 1538.362002] page dumped because: nonzero mapcount [ 1538.362033] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [ 1538.362613] CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 [ 1538.362687] Workqueue: pseries hotplug workque pseries_hp_work_fn [ 1538.362725] Call Trace: [ 1538.362743] [c0000009d4adf590] [c000000000e0e0fc] dump_stack+0xb0/0xf4 (unreliable) [ 1538.362789] [c0000009d4adf5d0] [c000000000475dfc] bad_page+0x12c/0x1b0 [ 1538.362827] [c0000009d4adf660] [c0000000004784bc] free_pcppages_bulk+0x5bc/0x940 [ 1538.362871] [c0000009d4adf760] [c000000000478c38] page_alloc_cpu_dead+0x118/0x120 [ 1538.362918] [c0000009d4adf7b0] [c00000000015b898] cpuhp_invoke_callback.constprop.5+0xb8/0x760 [ 1538.362969] [c0000009d4adf820] [c00000000015eee8] _cpu_down+0x188/0x340 [ 1538.363007] [c0000009d4adf890] [c00000000015d75c] cpu_down+0x5c/0xa0 [ 1538.363045] [c0000009d4adf8c0] [c00000000092c544] cpu_subsys_offline+0x24/0x40 [ 1538.363091] [c0000009d4adf8e0] [c0000000009212f0] device_offline+0xf0/0x130 [ 1538.363129] [c0000009d4adf920] [c00000000010aee4] dlpar_offline_cpu+0x1c4/0x2a0 [ 1538.363174] [c0000009d4adf9e0] [c00000000010b2f8] dlpar_cpu_remove+0xb8/0x190 [ 1538.363219] [c0000009d4adfa60] [c00000000010b4fc] dlpar_cpu_remove_by_index+0x12c/0x150 [ 1538.363264] [c0000009d4adfaf0] [c00000000010ca24] dlpar_cpu+0x94/0x800 [ 1538.363302] [c0000009d4adfc00] [c000000000102cc8] pseries_hp_work_fn+0x128/0x1e0 [ 1538.363347] [c0000009d4adfc70] [c00000000018aa84] process_one_work+0x304/0x5d0 [ 1538.363394] [c0000009d4adfd10] [c00000000018b5cc] worker_thread+0xcc/0x7a0 [ 1538.363433] [c0000009d4adfdc0] [c00000000019567c] kthread+0x1ac/0x1c0 [ 1538.363469] [c0000009d4adfe30] [c00000000000b7dc] ret_from_kernel_thread+0x5c/0x80 I'm trying to reproduce on a witherspoon system with a fedora rawhide guest. No crash yet but I got this: [ 183.796986] cpu 34 (hwid 34) Ready to die... [ 184.677349] [ 184.677384] ============================= [ 184.677385] WARNING: suspicious RCU usage [ 184.677388] 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 Not tainted [ 184.677389] ----------------------------- [ 184.677391] kernel/sched/core.c:567 suspicious rcu_dereference_check() usage! [ 184.677392] [ 184.677392] other info that might help us debug this: [ 184.677392] [ 184.677394] [ 184.677394] RCU used illegally from offline CPU! [ 184.677394] rcu_scheduler_active = 2, debug_locks = 1 [ 184.677397] 2 locks held by swapper/35/0: [ 184.677398] #0: c0000009c4d0e518 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x1f4/0x7e0 [ 184.677408] #1: c00000000240ba78 (rcu_read_lock){....}-{1:2}, at: get_nohz_timer_target+0xc8/0x3e0 [ 184.677413] [ 184.677413] stack backtrace: [ 184.677416] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 [ 184.677419] Call Trace: [ 184.677422] [c0000009bda47890] [c000000000b85d7c] dump_stack+0x100/0x174 (unreliable) [ 184.677426] [c0000009bda478f0] [c000000000246204] lockdep_rcu_suspicious+0x13c/0x15c [ 184.677428] [c0000009bda47980] [c0000000001ddc8c] get_nohz_timer_target+0x34c/0x3e0 [ 184.677431] [c0000009bda479f0] [c0000000002a1684] __mod_timer+0x514/0x7e0 [ 184.677435] [c0000009bda47a80] [c0000000001b8a50] queue_delayed_work_on+0x180/0x210 [ 184.677437] [c0000009bda47ad0] [c000000000b16620] debug_check_no_obj_freed+0x270/0x320 [ 184.677440] [c0000009bda47b90] [c000000000543584] free_pcp_prepare+0x124/0x470 [ 184.677443] [c0000009bda47bd0] [c000000000549208] free_unref_page+0x38/0x150 [ 184.677446] [c0000009bda47c10] [c0000000000dc8b8] xive_spapr_cleanup_queue+0x128/0x1e0 [ 184.677449] [c0000009bda47ca0] [c0000000000d9e14] xive_teardown_cpu+0x1a4/0x220 [ 184.677452] [c0000009bda47ce0] [c00000000012f0b4] pseries_mach_cpu_die+0x84/0x350 [ 184.677454] [c0000009bda47d90] [c000000000062b84] cpu_die+0x44/0x60 [ 184.677457] [c0000009bda47db0] [c00000000001f2d0] arch_cpu_idle_dead+0x30/0x50 [ 184.677460] [c0000009bda47dd0] [c0000000001ed524] do_idle+0x314/0x540 [ 184.677462] [c0000009bda47ea0] [c0000000001edb1c] cpu_startup_entry+0x3c/0x40 [ 184.677465] [c0000009bda47ed0] [c00000000006237c] start_secondary+0x7ac/0xb70 [ 184.677467] [c0000009bda47f90] [c00000000000c654] start_secondary_prolog+0x10/0x14 (In reply to Greg Kurz from comment #15) > I'm trying to reproduce on a witherspoon system with a fedora rawhide guest. > No crash yet but I got this: > > [ 183.796986] cpu 34 (hwid 34) Ready to die... > [ 184.677349] > [ 184.677384] ============================= > [ 184.677385] WARNING: suspicious RCU usage > [ 184.677388] 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 Not > tainted > [ 184.677389] ----------------------------- > [ 184.677391] kernel/sched/core.c:567 suspicious rcu_dereference_check() > usage! > [ 184.677392] > [ 184.677392] other info that might help us debug this: > [ 184.677392] > [ 184.677394] > [ 184.677394] RCU used illegally from offline CPU! > [ 184.677394] rcu_scheduler_active = 2, debug_locks = 1 > [ 184.677397] 2 locks held by swapper/35/0: > [ 184.677398] #0: c0000009c4d0e518 (&base->lock){-.-.}-{2:2}, at: > __mod_timer+0x1f4/0x7e0 > [ 184.677408] #1: c00000000240ba78 (rcu_read_lock){....}-{1:2}, at: > get_nohz_timer_target+0xc8/0x3e0 > [ 184.677413] > [ 184.677413] stack backtrace: > [ 184.677416] CPU: 35 PID: 0 Comm: swapper/35 Not tainted > 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 > [ 184.677419] Call Trace: > [ 184.677422] [c0000009bda47890] [c000000000b85d7c] dump_stack+0x100/0x174 > (unreliable) > [ 184.677426] [c0000009bda478f0] [c000000000246204] > lockdep_rcu_suspicious+0x13c/0x15c > [ 184.677428] [c0000009bda47980] [c0000000001ddc8c] > get_nohz_timer_target+0x34c/0x3e0 > [ 184.677431] [c0000009bda479f0] [c0000000002a1684] __mod_timer+0x514/0x7e0 > [ 184.677435] [c0000009bda47a80] [c0000000001b8a50] > queue_delayed_work_on+0x180/0x210 > [ 184.677437] [c0000009bda47ad0] [c000000000b16620] > debug_check_no_obj_freed+0x270/0x320 > [ 184.677440] [c0000009bda47b90] [c000000000543584] > free_pcp_prepare+0x124/0x470 > [ 184.677443] [c0000009bda47bd0] [c000000000549208] > free_unref_page+0x38/0x150 > [ 184.677446] [c0000009bda47c10] [c0000000000dc8b8] > xive_spapr_cleanup_queue+0x128/0x1e0 > [ 184.677449] [c0000009bda47ca0] [c0000000000d9e14] > xive_teardown_cpu+0x1a4/0x220 > [ 184.677452] [c0000009bda47ce0] [c00000000012f0b4] > pseries_mach_cpu_die+0x84/0x350 > [ 184.677454] [c0000009bda47d90] [c000000000062b84] cpu_die+0x44/0x60 > [ 184.677457] [c0000009bda47db0] [c00000000001f2d0] > arch_cpu_idle_dead+0x30/0x50 > [ 184.677460] [c0000009bda47dd0] [c0000000001ed524] do_idle+0x314/0x540 > [ 184.677462] [c0000009bda47ea0] [c0000000001edb1c] > cpu_startup_entry+0x3c/0x40 > [ 184.677465] [c0000009bda47ed0] [c00000000006237c] > start_secondary+0x7ac/0xb70 > [ 184.677467] [c0000009bda47f90] [c00000000000c654] > start_secondary_prolog+0x10/0x14 did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace? i reproduced again on mihawk (trace below), this time I got the trace on the unplugged cpu rather than the hotplug worker thread. It seems like what's happening is the hotplug worker calls device_offline() on the target CPU, which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die(): static void pseries_cpu_die(unsigned int cpu) { int tries; int cpu_status = 1; unsigned int pcpu = get_hard_smp_processor_id(cpu); if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) { cpu_status = 1; for (tries = 0; tries < 5000; tries++) { if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) { cpu_status = 0; break; } msleep(1); } } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { for (tries = 0; tries < 25; tries++) { cpu_status = smp_query_cpu_stopped(pcpu); if (cpu_status == QCSS_STOPPED || cpu_status == QCSS_HARDWARE_ERROR) break; cpu_relax(); } } if (cpu_status != 0) { printk("Querying DEAD? cpu %i (%i) shows %i\n", cpu, pcpu, cpu_status); } /* Isolation and deallocation are definitely done by * drslot_chrp_cpu. If they were not they would be * done here. Change isolate state to Isolate and * change allocation-state to Unusable. */ paca_ptrs[cpu]->cpu_start = 0; } The function is supposed to wait for the CPU to finish pseries_mach_cpu_die, but it only tries to wait for some limited amount of time before it gives up waiting and just prints, e.g.: Querying DEAD? cpu 314 (314) shows 2 2 in this case meaning QCSS_NOT_STOPPED. So back in the worker thread we continue on with other cleanup, including the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which generates some of the traces I posted earlier. Drain pages has the following assumptions: /* * Drain pcplists of all zones on the indicated processor. * * The processor must either be the current processor and the * thread pinned to the current processor or a processor that * is not online. */ static void drain_pages(unsigned int cpu) { struct zone *zone; for_each_populated_zone(zone) { drain_pages_zone(cpu, zone); } } so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is already in the middle of it), the per-cpu page cleanup might race with the worker thread calling drain_pages(). I'm not sure if we can expect to see this behavior if we don't see the "Querying DEAD?" messages though, since that suggests the unplugged CPU called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in pseries_mach_cpu_die() already, so it had already finished the page cleanup. maybe there's some other path though. If that's what's going on though, the question I guess is why is it okay for pseries_cpu_die() to give up on waiting for the CPU to go offline? If we can't risk hanging the worker thread with an indefinite wait maybe we need a longer wait time at least. [ 4463.745739] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000005a [ 4464.318516] Querying DEAD? cpu 90 (90) shows 2 [ 4464.318935] list_del corruption. prev->next should be c00a00000251da48, but was c0000009f167bb10 [ 4464.318955] ------------[ cut here ]------------ [ 4464.318956] kernel BUG at lib/list_debug.c:53! [ 4464.318959] Oops: Exception in kernel mode, sig: 5 [#1] [ 4464.318960] LE SMP NR_CPUS=2048 NUMA pSeries [ 4464.318962] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [ 4464.318988] CPU: 90 PID: 0 Comm: swapper/90 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1 [ 4464.318990] NIP: c0000000007acc70 LR: c0000000007acc6c CTR: 0000000000000000 [ 4464.318991] REGS: c0000009e5b7b840 TRAP: 0700 Not tainted (4.18.0-224.el8.bz1856588.ppc64le) [ 4464.318992] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 22000842 XER: 20040000 [ 4464.318996] CFAR: c000000000200e24 IRQMASK: 1 [ 4464.318996] GPR00: c0000000007acc6c c0000009e5b7bac0 c000000001ac0c00 0000000000000054 [ 4464.318996] GPR04: c0000009f15ecf90 c0000009f1675628 c0000009ed4d7f40 c000000001c90c00 [ 4464.318996] GPR08: 0000000000000007 c0000009f15f3980 00000009f0330000 7562202c38346164 [ 4464.318996] GPR12: 0000000000000000 c000000007f69280 c00a00000251da48 0000000000000000 [ 4464.318996] GPR16: 0000000000000001 c0000009f167bb00 0000000000000800 c0000009ff3d4980 [ 4464.318996] GPR20: c0000009f167bb10 5deadbeef0000100 5deadbeef0000200 0000000000187961 [ 4464.318996] GPR24: c0000009e5b7bb78 0000000000000000 0000000000000001 ffffffffffffffff [ 4464.318996] GPR28: c00a00000251da40 c0000009f167bb10 c0000009f167bb10 0000000000000000 [ 4464.319013] NIP [c0000000007acc70] __list_del_entry_valid+0xd0/0x100 [ 4464.319015] LR [c0000000007acc6c] __list_del_entry_valid+0xcc/0x100 [ 4464.319015] Call Trace: [ 4464.319017] [c0000009e5b7bac0] [c0000000007acc6c] __list_del_entry_valid+0xcc/0x100 (unreliable) [ 4464.319020] [c0000009e5b7bb20] [c0000000004780f8] free_pcppages_bulk+0x1f8/0x940 [ 4464.319022] [c0000009e5b7bc20] [c00000000047bc80] free_unref_page+0xd0/0x100 [ 4464.319025] [c0000009e5b7bc50] [c0000000000bbfb8] xive_spapr_cleanup_queue+0x158/0x200 [ 4464.319028] [c0000009e5b7bcf0] [c0000000000b93dc] xive_teardown_cpu+0x1bc/0x240 [ 4464.319030] [c0000009e5b7bd30] [c00000000010bcb8] pseries_mach_cpu_die+0x78/0x2f0 [ 4464.319032] [c0000009e5b7bde0] [c00000000005c8f8] cpu_die+0x48/0x70 [ 4464.319034] [c0000009e5b7be00] [c000000000021cf0] arch_cpu_idle_dead+0x20/0x40 [ 4464.319036] [c0000009e5b7be20] [c0000000001b5254] do_idle+0x2f4/0x4c0 [ 4464.319038] [c0000009e5b7bea0] [c0000000001b566c] cpu_startup_entry+0x3c/0x40 [ 4464.319039] [c0000009e5b7bed0] [c00000000005c45c] start_secondary+0x7bc/0x8f0 [ 4464.319041] [c0000009e5b7bf90] [c00000000000ac70] start_secondary_prolog+0x10/0x14 (In reply to Michael Roth from comment #16) > > did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace? > Ah... I should have kept more context and these "Querying DEAD?" lines pop up so often that I don't remember :-\ I'll try again. > i reproduced again on mihawk (trace below), this time I got the trace on the > unplugged cpu rather than the hotplug worker thread. It seems like what's > happening is the hotplug worker calls device_offline() on the target CPU, > which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die(): > Yes. > > The function is supposed to wait for the CPU to finish pseries_mach_cpu_die, > but it only tries to wait for some limited amount of time before it gives up > waiting and just prints, e.g.: > > Querying DEAD? cpu 314 (314) shows 2 > > 2 in this case meaning QCSS_NOT_STOPPED. > > So back in the worker thread we continue on with other cleanup, including > the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which > generates some of the traces I posted earlier. > > Drain pages has the following assumptions: > > /* > * Drain pcplists of all zones on the indicated processor. > * > * The processor must either be the current processor and the > * thread pinned to the current processor or a processor that > * is not online. > */ If the queried state was QCSS_NOT_STOPPED then it is possible that the processor is still online, right ? > static void drain_pages(unsigned int cpu) > { > struct zone *zone; > > for_each_populated_zone(zone) { > drain_pages_zone(cpu, zone); > } > } > > so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is > already in the middle of it), the per-cpu page cleanup might race with the > worker thread calling drain_pages(). > Ugly :-\ > I'm not sure if we can expect to see this behavior if we don't see the > "Querying DEAD?" messages though, since that suggests the unplugged CPU > called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in > pseries_mach_cpu_die() already, so it had already finished the page cleanup. > maybe there's some other path though. > > > If that's what's going on though, the question I guess is why is it okay for > pseries_cpu_die() to give up on waiting for the CPU to go offline? If we > can't risk hanging the worker thread with an indefinite wait maybe we need a > longer wait time at least. > This sleep based synchronization looks like a recipe for disaster... waiting longer might just hide the problem until next time. But yes, your questioning is clearly good: why does the worker thread give up in the first place ? I mean, if the unplugged CPU never completes pseries_mach_cpu_die() then something went very wrong and I don't quite see the point in continuing execution. FYI, I could reproduce with a rhel83 guest and I was hoping to capture a vmcore but unfortunately kdump didn't work because of some other problem that is supposed to be fixed in the latest kernel... and later on I hit the host hard lockup issue of BZ #1733467 :-) I'll try again tomorrow. (In reply to Greg Kurz from comment #17) > (In reply to Michael Roth from comment #16) > > > > did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace? > > > > Ah... I should have kept more context and these "Querying DEAD?" lines pop > up so often that I don't remember :-\ I'll try again. > > > i reproduced again on mihawk (trace below), this time I got the trace on the > > unplugged cpu rather than the hotplug worker thread. It seems like what's > > happening is the hotplug worker calls device_offline() on the target CPU, > > which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die(): > > > > Yes. > > > > > The function is supposed to wait for the CPU to finish pseries_mach_cpu_die, > > but it only tries to wait for some limited amount of time before it gives up > > waiting and just prints, e.g.: > > > > Querying DEAD? cpu 314 (314) shows 2 > > > > 2 in this case meaning QCSS_NOT_STOPPED. > > > > So back in the worker thread we continue on with other cleanup, including > > the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which > > generates some of the traces I posted earlier. > > > > Drain pages has the following assumptions: > > > > /* > > * Drain pcplists of all zones on the indicated processor. > > * > > * The processor must either be the current processor and the > > * thread pinned to the current processor or a processor that > > * is not online. > > */ > > If the queried state was QCSS_NOT_STOPPED then it is possible that > the processor is still online, right ? It appears so, I added some traces to confirm, and although I haven't been able to reproduce with the traces in place there is some interleaving that seems to violate drain_pages() assumptions that the unplugged cpu is no longer running: [ 1286.437719] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000027 [ 1287.004749] xive: xive_teardown_cpu 39: > xive_ops->teardown_cpu [ 1287.004752] xive: xive_teardown_cpu 39: < xive_ops->teardown_cpu [ 1287.004753] xive: xive_teardown_cpu 39: > xive_cleanup_cpu_ipi [ 1287.005439] Querying DEAD? cpu 39 (39) shows 2 [ 1287.005743] drain_pages called for cpu 39 [ 1287.005750] xive: xive_teardown_cpu 39: < xive_cleanup_cpu_ipi # cpu 39 enters xive_cleanup_cpu_queues while unplug worker thread is still inside drain_pages() [ 1287.005773] xive: xive_teardown_cpu 39: > xive_cleanup_cpu_queues [ 1287.005774] drain_pages completed for cpu 39 [ 1287.005806] xive: xive_teardown_cpu 39: < xive_cleanup_cpu_queues [ 1287.005882] cpu 39 (hwid 39) Ready to die... [ 1287.098829] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000027 > > > static void drain_pages(unsigned int cpu) > > { > > struct zone *zone; > > > > for_each_populated_zone(zone) { > > drain_pages_zone(cpu, zone); > > } > > } > > > > so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is > > already in the middle of it), the per-cpu page cleanup might race with the > > worker thread calling drain_pages(). > > > > Ugly :-\ > > > I'm not sure if we can expect to see this behavior if we don't see the > > "Querying DEAD?" messages though, since that suggests the unplugged CPU > > called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in > > pseries_mach_cpu_die() already, so it had already finished the page cleanup. > > maybe there's some other path though. > > > > > > If that's what's going on though, the question I guess is why is it okay for > > pseries_cpu_die() to give up on waiting for the CPU to go offline? If we > > can't risk hanging the worker thread with an indefinite wait maybe we need a > > longer wait time at least. > > > > This sleep based synchronization looks like a recipe for disaster... waiting > longer might just hide the problem until next time. But yes, your questioning > is clearly good: why does the worker thread give up in the first place ? I > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then > something > went very wrong and I don't quite see the point in continuing execution. Agreed, the behavior has been around for nearly a decade it seems, maybe nobody has cared/noticed it since only the xive cleanup stuff is making the issue visible > > FYI, I could reproduce with a rhel83 guest and I was hoping to capture a > vmcore > but unfortunately kdump didn't work because of some other problem that is > supposed > to be fixed in the latest kernel... and later on I hit the host hard lockup > issue > of BZ #1733467 :-) I'll try again tomorrow. Might also be worth noting I hit the host lockup after a few hours of trying this workload as well. Maybe having an unplug loop in the workload might help with reproducing that as well. (In reply to Michael Roth from comment #18) > > This sleep based synchronization looks like a recipe for disaster... waiting > > longer might just hide the problem until next time. But yes, your questioning > > is clearly good: why does the worker thread give up in the first place ? I > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then > > something > > went very wrong and I don't quite see the point in continuing execution. > > Agreed, the behavior has been around for nearly a decade it seems, maybe > nobody has cared/noticed it since only the xive cleanup stuff is making the > issue visible > Yeah, maybe someone should send an RFC that patches out all the giving-up-logic and wait for flames or clarification. > > > > FYI, I could reproduce with a rhel83 guest and I was hoping to capture a > > vmcore > > but unfortunately kdump didn't work because of some other problem that is > > supposed > > to be fixed in the latest kernel... and later on I hit the host hard lockup > > issue > > of BZ #1733467 :-) I'll try again tomorrow. > > Might also be worth noting I hit the host lockup after a few hours of trying > this workload as well. Maybe having an unplug loop in the workload might > help with reproducing that as well. Heh dunno if this is related but David reserved our witherspoon system all day, so I didn't investigate more. :) (In reply to Greg Kurz from comment #19) > (In reply to Michael Roth from comment #18) > > > This sleep based synchronization looks like a recipe for disaster... waiting > > > longer might just hide the problem until next time. But yes, your questioning > > > is clearly good: why does the worker thread give up in the first place ? I > > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then > > > something > > > went very wrong and I don't quite see the point in continuing execution. > > > > Agreed, the behavior has been around for nearly a decade it seems, maybe > > nobody has cared/noticed it since only the xive cleanup stuff is making the > > issue visible > > > > Yeah, maybe someone should send an RFC that patches out all the > giving-up-logic > and wait for flames or clarification. Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready :) (In reply to Michael Roth from comment #20) > (In reply to Greg Kurz from comment #19) > > (In reply to Michael Roth from comment #18) > > > > This sleep based synchronization looks like a recipe for disaster... waiting > > > > longer might just hide the problem until next time. But yes, your questioning > > > > is clearly good: why does the worker thread give up in the first place ? I > > > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then > > > > something > > > > went very wrong and I don't quite see the point in continuing execution. > > > > > > Agreed, the behavior has been around for nearly a decade it seems, maybe > > > nobody has cared/noticed it since only the xive cleanup stuff is making the > > > issue visible > > > > > > > Yeah, maybe someone should send an RFC that patches out all the > > giving-up-logic > > and wait for flames or clarification. > > Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready > :) FWIW, I've made a quick'n'dirty hack to a -229 kernel to do just that and, as expected, I don't see the Querying DEAD? traces anymore, while I got plenty of them before. I'll let the test run over the WE. (In reply to Greg Kurz from comment #21) > (In reply to Michael Roth from comment #20) > > (In reply to Greg Kurz from comment #19) > > > (In reply to Michael Roth from comment #18) > > > > > This sleep based synchronization looks like a recipe for disaster... waiting > > > > > longer might just hide the problem until next time. But yes, your questioning > > > > > is clearly good: why does the worker thread give up in the first place ? I > > > > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then > > > > > something > > > > > went very wrong and I don't quite see the point in continuing execution. > > > > > > > > Agreed, the behavior has been around for nearly a decade it seems, maybe > > > > nobody has cared/noticed it since only the xive cleanup stuff is making the > > > > issue visible > > > > > > > > > > Yeah, maybe someone should send an RFC that patches out all the > > > giving-up-logic > > > and wait for flames or clarification. > > > > Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready > > :) > > FWIW, I've made a quick'n'dirty hack to a -229 kernel to do just that and, > as expected, > I don't see the Querying DEAD? traces anymore, while I got plenty of them > before. I'll > let the test run over the WE. The Witherspoon system I was using for this was moved to another location during the WE... it was planned maintenance I just forgot about :P so I'm re-running the test now. BTW, I came across an old commit from benh that bumps the cpu die timeout to solve a race when a dying CPU still has an active timer. It also mentions the "Querying DEAD?" trace and it seems to make it a prereq for the issue to happen. commit 940ce422a367c8e65404a5ef1ff5969527a06410 Author: Benjamin Herrenschmidt <benh.org> Date: Sat Jul 31 15:04:15 2010 +1000 powerpc/pseries: Increase cpu die timeout I still don't understand why this timeout logic exists but I'm not so sure now that getting rid of it is the way to go... Anyway, this seems to be a guest kernel item. Updating component accordingly. Also, a recent change, not upstream yet but already in mpe's powerpc/next, might be worth taken into account because it changes the way CPUs are offlined. https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=48f6e7f6d948b56489da027bc3284c709b939d28 commit 48f6e7f6d948b56489da027bc3284c709b939d28 Author: Nathan Lynch <nathanl.com> Date: Fri Jun 12 00:12:21 2020 -0500 powerpc/pseries: remove cede offline state for CPUs Hi Xujun, Since this is clearly a guest side issue, does it also happen with a RHEL7.9 guest ? Cheers, -- Greg would it be possible to make this BZ public to aid in upstream discussion? I posted a patch using the sleep approach after failing to come up with a better solution: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/215933.html but apparently this has even more of a history than we realized, and there's already been a previous patch where a sleep was added that never made it in: https://lore.kernel.org/linuxppc-dev/1556752043.jyg2z3kgaw.astroid@bobo.none/ (In reply to Greg Kurz from comment #24) > Hi Xujun, > > Since this is clearly a guest side issue, does it also happen with a > RHEL7.9 guest ? > > Cheers, > > -- > Greg Have tested it, have no this problem so far. I've talked to Michael Ellerman and asked to expedite putting this into his fixes branch. Mike, the fix is now in mpe's tree (801980f6497946048709b9b09771a1729551d705). Can you please prepare a downstream brew build of the fix ASAP - we'll need to verify this in order to justify the exception request. (In reply to David Gibson from comment #31) > Mike, the fix is now in mpe's tree > (801980f6497946048709b9b09771a1729551d705). > > Can you please prepare a downstream brew build of the fix ASAP - we'll need > to verify this in order to justify the exception request. I've kicked off a brew build at: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30871055 it should be ready within a few hours, assuming all goes well. Previous brew build had an issue, this one built properly and is ready for verification: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108 I'll also attempt to verify today using a witherspoon system. (In reply to Michael Roth from comment #33) > Previous brew build had an issue, this one built properly and is ready for > verification: > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108 > > I'll also attempt to verify today using a witherspoon system. Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream. Reproduced original failure after 3 iterations with kernel-4.18.0-233. With this patch applied I ran 125 iterations overnight and found no more occurrences of the original issue. (In reply to Michael Roth from comment #34) > (In reply to Michael Roth from comment #33) > > Previous brew build had an issue, this one built properly and is ready for > > verification: > > > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108 > > > > I'll also attempt to verify today using a witherspoon system. > > Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream. > > Reproduced original failure after 3 iterations with kernel-4.18.0-233. With > this patch applied I ran 125 iterations overnight and found no more > occurrences of the original issue. Hi Michael Could you provide a new scratch build for QE to test this bug fix? (In reply to Xujun Ma from comment #37) > (In reply to Michael Roth from comment #34) > > (In reply to Michael Roth from comment #33) > > > Previous brew build had an issue, this one built properly and is ready for > > > verification: > > > > > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108 > > > > > > I'll also attempt to verify today using a witherspoon system. > > > > Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream. > > > > Reproduced original failure after 3 iterations with kernel-4.18.0-233. With > > this patch applied I ran 125 iterations overnight and found no more > > occurrences of the original issue. > > Hi Michael > > Could you provide a new scratch build for QE to test this bug fix? Hi Xujun, The earlier brew (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108) is based on latest kernel-4.18.0-234.el8 rhel 8.3 kernel. Not sure why the download links aren't clickable there anymore, but the RPMs are still available here: http://brew-task-repos.usersys.redhat.com/repos/scratch/mroth/kernel/4.18.0/234.el8.bz1856588/ I've started a new brew just in case, but it should be the same code: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=31009311 (In reply to Michael Roth from comment #38) > (In reply to Xujun Ma from comment #37) > > (In reply to Michael Roth from comment #34) > > > (In reply to Michael Roth from comment #33) > > > > Previous brew build had an issue, this one built properly and is ready for > > > > verification: > > > > > > > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108 > > > > > > > > I'll also attempt to verify today using a witherspoon system. > > > > > > Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream. > > > > > > Reproduced original failure after 3 iterations with kernel-4.18.0-233. With > > > this patch applied I ran 125 iterations overnight and found no more > > > occurrences of the original issue. > > > > Hi Michael > > > > Could you provide a new scratch build for QE to test this bug fix? > > Hi Xujun, > > The earlier brew > (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108) is based on > latest kernel-4.18.0-234.el8 rhel 8.3 kernel. Not sure why the download > links aren't clickable there anymore, but the RPMs are still available here: > > > http://brew-task-repos.usersys.redhat.com/repos/scratch/mroth/kernel/4.18.0/ > 234.el8.bz1856588/ > > I've started a new brew just in case, but it should be the same code: > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=31009311 I have tested the scratch build above, and didn't hit this bug. Base the test result, I think the bug has been fixed in this build. Patch(es) available on kernel-4.18.0-236.el8 I have tested this build and didn't hit the same issue. Base the test reslult,the bug has been fixed in this build,set it to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4431 |