1856588 – Guest crashed and hung when hot unplug vcpus

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1856588 - Guest crashed and hung when hot unplug vcpus

Summary: Guest crashed and hung when hot unplug vcpus

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	8.3
Hardware:	ppc64le
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.3
Assignee:	David Gibson
QA Contact:	Xujun Ma
Docs Contact:
URL:
Whiteboard:
Depends On:	1733467
Blocks:	1776265
TreeView+	depends on / blocked

Reported:	2020-07-14 02:33 UTC by Xujun Ma
Modified:	2020-11-04 21:15 UTC (History)
CC List:	15 users (show)
Fixed In Version:	kernel-4.18.0-236.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-04 01:24:57 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	187055	0	None	None	None	2020-07-22 23:57:57 UTC
Red Hat Product Errata	RHSA-2020:4431	0	None	None	None	2020-11-04 01:25:24 UTC

Description Xujun Ma 2020-07-14 02:33:28 UTC

Description of problem:
Guest crashed and hung when hot unplug vcpus

Version-Release number of selected component (if applicable):
host:
kernel-4.18.0-224.el8.ppc64le
qemu-kvm-5.0.0-0.scrmod+el8.3.0+7308+053b39e7.wrb200708.ppc64le
guest:
kernel-4.18.0-221.el8.ppc64le


How reproducible:
1/5

Steps to Reproduce:
1.Boot up guest with command
/usr/libexec/qemu-kvm \
 -m 40960 \
 -smp 1,maxcpus=384,cores=1,threads=1,sockets=384  \
 -nodefaults \
 -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \
 -device spapr-vty,id=serial111,chardev=serial_id_serial0 \
 -mon chardev=serial_id_serial0,mode=readline \
 -device virtio-scsi-pci,bus=pci.0 \
 -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \
 -drive file=rhel830-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -mon chardev=monitor,mode=readline -chardev socket,path=monitor,id=monitor,server,nowait,signal=off \
2.Hotplug vcpus to max then unplug them
for((i=1;i<384;i=i+1));do echo "device_add host-spapr-cpu-core,core-id=$i,id=core$i" |nc -U monitor; sleep 1;done;echo " info hotpluggable-cpus" |nc -U monitor;
for((i=1;i<384;i=i+1));do echo "device_del core$i" |nc -U monitor; sleep 1;done;echo " info hotpluggable-cpus" |nc -U monitor

3.

Actual results:
Guest crashed and hung when hot unplug vcpus.
guest log:
[ 1700.662709] cpu 1 (hwid 1) Ready to die...
[ 1701.209850] cpu 2 (hwid 2) Ready to die...
[ 1702.281112] cpu 3 (hwid 3) Ready to die...
[ 1703.319782] cpu 4 (hwid 4) Ready to die...
[ 1704.229688] cpu 5 (hwid 5) Ready to die...
[ 1705.439959] cpu 6 (hwid 6) Ready to die...
[ 1706.360061] cpu 7 (hwid 7) Ready to die...
[ 1707.269429] Querying DEAD? cpu 8 (8) shows 2
[ 1707.269530] cpu 8 (hwid 8) Ready to die...
[ 1708.309237] cpu 9 (hwid 9) Ready to die...
[ 1709.399520] Querying DEAD? cpu 10 (10) shows 2
[ 1709.399585] cpu 10 (hwid 10) Ready to die...
[ 1710.499807] cpu 11 (hwid 11) Ready to die...
[ 1711.499443] cpu 12 (hwid 12) Ready to die...
[ 1712.349385] cpu 13 (hwid 13) Ready to die...
[ 1713.408991] cpu 14 (hwid 14) Ready to die...
[ 1714.389145] cpu 15 (hwid 15) Ready to die...
[ 1715.449318] cpu 16 (hwid 16) Ready to die...
[ 1716.459444] cpu 17 (hwid 17) Ready to die...
[ 1717.499136] cpu 18 (hwid 18) Ready to die...
[ 1718.469254] cpu 19 (hwid 19) Ready to die...
[ 1719.529225] cpu 20 (hwid 20) Ready to die...
[ 1720.479599] cpu 21 (hwid 21) Ready to die...
[ 1721.470140] cpu 22 (hwid 22) Ready to die...
[ 1722.563510] cpu 23 (hwid 23) Ready to die...
[ 1723.513059] cpu 24 (hwid 24) Ready to die...
[ 1724.559695] cpu 25 (hwid 25) Ready to die...
[ 1725.529543] cpu 26 (hwid 26) Ready to die...
[ 1726.550875] cpu 27 (hwid 27) Ready to die...
[ 1727.631186] cpu 28 (hwid 28) Ready to die...
[ 1728.682236] cpu 29 (hwid 29) Ready to die...
[ 1729.722637] cpu 30 (hwid 30) Ready to die...
[ 1730.712917] cpu 31 (hwid 31) Ready to die...
[ 1731.659790] cpu 32 (hwid 32) Ready to die...
[ 1732.679879] cpu 33 (hwid 33) Ready to die...
[ 1733.739967] cpu 34 (hwid 34) Ready to die...
[ 1734.821156] cpu 35 (hwid 35) Ready to die...
[ 1735.710528] Querying DEAD? cpu 36 (36) shows 2
[ 1735.710841] cpu 36 (hwid 36) Ready to die...
[ 1736.790540] Querying DEAD? cpu 37 (37) shows 2
[ 1736.790693] cpu 37 (hwid 37) Ready to die...
[ 1737.750903] cpu 38 (hwid 38) Ready to die...
[ 1738.844373] cpu 39 (hwid 39) Ready to die...
[ 1739.790395] cpu 40 (hwid 40) Ready to die...
[ 1740.830772] cpu 41 (hwid 41) Ready to die...
[ 1741.901062] Querying DEAD? cpu 42 (42) shows 2
[ 1741.902063] cpu 42 (hwid 42) Ready to die...
[ 1742.862320] cpu 43 (hwid 43) Ready to die...
[ 1743.960948] cpu 44 (hwid 44) Ready to die...
[ 1744.970721] cpu 45 (hwid 45) Ready to die...
[ 1745.930536] cpu 46 (hwid 46) Ready to die...
[ 1747.040327] cpu 47 (hwid 47) Ready to die...
[ 1748.050426] cpu 48 (hwid 48) Ready to die...
[ 1749.010553] cpu 49 (hwid 49) Ready to die...
[ 1750.050393] cpu 50 (hwid 50) Ready to die...
[ 1751.090448] cpu 51 (hwid 51) Ready to die...
[ 1752.050652] cpu 52 (hwid 52) Ready to die...
[ 1753.040544] cpu 53 (hwid 53) Ready to die...
[ 1754.060560] cpu 54 (hwid 54) Ready to die...
[ 1755.050790] cpu 55 (hwid 55) Ready to die...
[ 1756.130884] cpu 56 (hwid 56) Ready to die...
[ 1757.110932] cpu 57 (hwid 57) Ready to die...
[ 1758.150774] cpu 58 (hwid 58) Ready to die...
[ 1759.230772] cpu 59 (hwid 59) Ready to die...
[ 1761.126767] cpu 60 (hwid 60) Ready to die...
[ 1761.601254] cpu 61 (hwid 61) Ready to die...
[ 1765.564157] cpu 62 (hwid 62) Ready to die...
[ 1766.142467] cpu 63 (hwid 63) Ready to die...
[ 1766.872245] cpu 64 (hwid 64) Ready to die...
[ 1767.353447] cpu 65 (hwid 65) Ready to die...
[ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
[ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
[ 1767.952322] ------------[ cut here ]------------
[ 1767.952323] kernel BUG at lib/list_debug.c:56!
[ 1767.952325] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1767.952326] LE SMP NR_CPUS=2048 NUMA pSeries
[ 1767.952328] Modules linked in: fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat_ipv6 nf_nat_ipv6 nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_chain_route_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink uio_pdrv_genirq ip_tables xfs libcrc32c sd_mod sg xts vmx_crypto virtio_net net_failover failover virtio_scsi dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
[ 1767.952352] CPU: 66 PID: 0 Comm: swapper/66 Kdump: loaded Not tainted 4.18.0-221.el8.ppc64le #1
[ 1767.952354] NIP:  c0000000007ab50c LR: c0000000007ab508 CTR: 00000000000003ac
[ 1767.952355] REGS: c0000009e5a17840 TRAP: 0700   Not tainted  (4.18.0-221.el8.ppc64le)
[ 1767.952355] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28000842  XER: 20040000
[ 1767.952360] CFAR: c0000000001ffe64 IRQMASK: 1 
[ 1767.952360] GPR00: c0000000007ab508 c0000009e5a17ac0 c000000001ac0700 0000000000000054 
[ 1767.952360] GPR04: c0000009f056cf90 c0000009f05f5628 c0000009ed40d654 c000000001c90700 
[ 1767.952360] GPR08: 0000000000000007 c0000009f0573980 00000009ef2b0000 7562202c38303230 
[ 1767.952360] GPR12: 0000000000000000 c0000007fff6ce80 c00a000002470208 0000000000000000 
[ 1767.952360] GPR16: 0000000000000001 c0000009f05fbb00 0000000000000800 c0000009ff3d4980 
[ 1767.952360] GPR20: c0000009f05fbb10 5deadbeef0000100 5deadbeef0000200 0000000000187961 
[ 1767.952360] GPR24: c0000009e5a17b78 0000000000000000 0000000000000001 ffffffffffffffff 
[ 1767.952360] GPR28: c00a000002470200 c0000009f05fbb10 c0000009f05fbb10 0000000000000000 
[ 1767.952375] NIP [c0000000007ab50c] __list_del_entry_valid+0xac/0x100
[ 1767.952376] LR [c0000000007ab508] __list_del_entry_valid+0xa8/0x100
[ 1767.952377] Call Trace:
[ 1767.952378] [c0000009e5a17ac0] [c0000000007ab508] __list_del_entry_valid+0xa8/0x100 (unreliable)
[ 1767.952381] [c0000009e5a17b20] [c000000000476e18] free_pcppages_bulk+0x1f8/0x940
[ 1767.952383] [c0000009e5a17c20] [c00000000047a9a0] free_unref_page+0xd0/0x100
[ 1767.952386] [c0000009e5a17c50] [c0000000000bc2a8] xive_spapr_cleanup_queue+0x148/0x1b0
[ 1767.952388] [c0000009e5a17cf0] [c0000000000b96dc] xive_teardown_cpu+0x1bc/0x240
[ 1767.952391] [c0000009e5a17d30] [c00000000010bf28] pseries_mach_cpu_die+0x78/0x2f0
[ 1767.952393] [c0000009e5a17de0] [c00000000005c8d8] cpu_die+0x48/0x70
[ 1767.952394] [c0000009e5a17e00] [c000000000021cf0] arch_cpu_idle_dead+0x20/0x40
[ 1767.952397] [c0000009e5a17e20] [c0000000001b4294] do_idle+0x2f4/0x4c0
[ 1767.952399] [c0000009e5a17ea0] [c0000000001b46a8] cpu_startup_entry+0x38/0x40
[ 1767.952400] [c0000009e5a17ed0] [c00000000005c43c] start_secondary+0x7bc/0x8f0
[ 1767.952403] [c0000009e5a17f90] [c00000000000ac70] start_secondary_prolog+0x10/0x14
[ 1767.952403] Instruction dump:
[ 1767.952405] 7d455378 38631100 f8010070 4ba54935 60000000 0fe00000 7c0802a6 3c62ff63 
[ 1767.952408] 386311b0 f8010070 4ba54919 60000000 <0fe00000> 7c0802a6 7c852378 7c641b78 
[ 1767.952412] ---[ end trace 03d951dc3f4aac76 ]---
I'm in purgatory
[    0.000000] radix-mmu: Page sizes from device-tree:
[    0.000000] radix-mmu: Page size shift = 12 AP=0x0
[    0.000000] radix-mmu: Page size shift = 16 AP=0x5
[    0.000000] radix-mmu: Page size shift = 21 AP=0x1
[    0.000000] radix-mmu: Page size shift = 30 AP=0x2
[    0.000000] lpar: Using radix MMU under hypervisor
[    0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages
[    0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000048000000 with 2.00 MiB pages
[    0.000000] radix-mmu: Process table (____ptrval____) and radix root for kernel: (____ptrval____)
[    0.000000] Linux version 4.18.0-221.el8.ppc64le (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Jun 25 20:53:41 UTC 2020
[    0.000000] Found initrd at 0xc00000000a730000:0xc00000000bc28387
[    0.000000] Using pSeries machine description
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] Partition configured for 384 cpus.
[    0.000000] CPU maps initialized for 1 thread per core
[    0.000000] NUMA disabled by user
[    0.000000] -----------------------------------------------------
[    0.000000] ppc64_pft_size    = 0x0
[    0.000000] phys_mem_size     = 0x48000000
[    0.000000] dcache_bsize      = 0x80
[    0.000000] icache_bsize      = 0x80
[    0.000000] cpu_features      = 0x0001e86f8f4f91a7
[    0.000000]   possible        = 0x0003fbffcf5fb1a7
[    0.000000]   always          = 0x00000003800081a1
[    0.000000] cpu_user_features = 0xdc0065c2 0xaee00000
[    0.000000] mmu_features      = 0x3c006041
[    0.000000] firmware_features = 0x00000085455a445f
[    0.000000] physical_start    = 0x8000000
[    0.000000] -----------------------------------------------------
[    0.000000] numa:   NODE_DATA [mem 0x44e73c80-0x44e7ffff]



Host log:

[ 7168.635667] NIP [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.635668] LR [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.635669] Call Trace:
[ 7168.635669] [c000003e0bb7f8b0] [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv] (unreliable)
[ 7168.635671] [c000003e0bb7f980] [c00800000f2defe8] kvmppc_vcpu_run_hv+0x980/0x1060 [kvm_hv]
[ 7168.635673] [c000003e0bb7fa90] [c00800000f4bebbc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 7168.635674] [c000003e0bb7fab0] [c00800000f4ba36c] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
[ 7168.635675] [c000003e0bb7fba0] [c00800000f4a33b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
[ 7168.635676] [c000003e0bb7fd10] [c000000000546950] do_vfs_ioctl+0xe0/0xaa0
[ 7168.635678] [c000003e0bb7fde0] [c0000000005474e4] sys_ioctl+0xc4/0x160
[ 7168.635679] [c000003e0bb7fe30] [c00000000000b408] system_call+0x5c/0x70
[ 7168.635680] Instruction dump:
[ 7168.635681] 614af804 7fa95000 409efcf8 7fe3fb78 48014d75 e8410018 4bfffce8 60000000 
[ 7168.635685] 60000000 2f9b0100 409efbfc 48014499 <e8410018> 4bfffbf0 60000000 60000000 
[ 7168.645747] watchdog: CPU 27 Hard LOCKUP
[ 7168.645748] watchdog: CPU 27 TB:3745071728345, last heartbeat TB:3736863793317 (16031ms ago)
[ 7168.645749] Modules linked in: vhost_net vhost vhost_iotlb tap tun nfnetlink bluetooth ecdh_generic rfkill rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bridge stp llc ib_isert kvm_hv iscsi_target_mod kvm i2c_dev ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi rdma_ucm scsi_transport_iscsi ib_ipoib ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_uverbs ib_core ofpart powernv_flash at24 xts uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler vmx_crypto mtd ibmpowernv opal_prd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sg nouveau mlx5_core ast drm_vram_helper drm_ttm_helper i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci mlxfw libata tls(t) tg3 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
[ 7168.645801] CPU: 27 PID: 41801 Comm: qemu-kvm Kdump: loaded Tainted: G                 --------- -t - 4.18.0-224.el8.ppc64le #1
[ 7168.645802] NIP:  c00800000f2de06c LR: c00800000f2de06c CTR: c00000000001aba0
[ 7168.645804] REGS: c000003d261e3748 TRAP: 0100   Tainted: G                 --------- -t -  (4.18.0-224.el8.ppc64le)
[ 7168.645805] MSR:  9000000102803033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28002482  XER: 00000000
[ 7168.645814] CFAR: c000000001c8a178 IRQMASK: 40000003d6f41f 
[ 7168.645816] GPR00: c00800000f2de06c c000003d261e38b0 c000000001ac0c00 c000003d261e3748 
[ 7168.645820] GPR04: c009dfc6696fd218 0000000000000000 0000000000000000 000000000000001b 
[ 7168.645824] GPR08: 0000000000000003 0000000000000000 0000003ffcdc0000 c00800000f2f2500 
[ 7168.645827] GPR12: c00000000001aba0 c000003ffffe1780 00007fffa8ff0000 00007ff0fc790000 
[ 7168.645831] GPR16: 00007fffa7e84410 c000003e0542a578 0000000000000001 c000000001345d10 
[ 7168.645834] GPR20: 000000000000001b c000000001c8a178 c000003e0542a578 000000000000001b 
[ 7168.645838] GPR24: 000000000000001b ffffffffffffffff 0040000003d6f41f 0000000000000100 
[ 7168.645842] GPR28: c000003e05420000 c000003d27fa0000 c000003fd336b400 c000003d8c4d2520 
[ 7168.645846] NIP [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.645848] LR [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.645848] Call Trace:
[ 7168.645850] [c000003d261e38b0] [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv] (unreliable)
[ 7168.645852] [c000003d261e3980] [c00800000f2defe8] kvmppc_vcpu_run_hv+0x980/0x1060 [kvm_hv]
[ 7168.645854] [c000003d261e3a90] [c00800000f4bebbc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 7168.645856] [c000003d261e3ab0] [c00800000f4ba36c] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
[ 7168.645858] [c000003d261e3ba0] [c00800000f4a33b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
[ 7168.645859] [c000003d261e3d10] [c000000000546950] do_vfs_ioctl+0xe0/0xaa0
[ 7168.645861] [c000003d261e3de0] [c0000000005474e4] sys_ioctl+0xc4/0x160
[ 7168.645863] [c000003d261e3e30] [c00000000000b408] system_call+0x5c/0x70
[ 7168.645864] Instruction dump:
[ 7168.645866] 614af804 7fa95000 409efcf8 7fe3fb78 48014d75 e8410018 4bfffce8 60000000 
[ 7168.645871] 60000000 2f9b0100 409efbfc 48014499 <e8410018> 4bfffbf0 60000000 60000000 
[ 7168.655811] watchdog: CPU 29 Hard LOCKUP
[ 7168.655812] watchdog: CPU 29 TB:3745076905496, last heartbeat TB:3736863793366 (16041ms ago)
[ 7168.655813] Modules linked in: vhost_net vhost vhost_iotlb tap tun nfnetlink bluetooth ecdh_generic rfkill rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bridge stp llc ib_isert kvm_hv iscsi_target_mod kvm i2c_dev ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi rdma_ucm scsi_transport_iscsi ib_ipoib ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_uverbs ib_core ofpart powernv_flash at24 xts uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler vmx_crypto mtd ibmpowernv opal_prd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sg nouveau mlx5_core ast drm_vram_helper drm_ttm_helper i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci mlxfw libata tls(t) tg3 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
[ 7168.655849] CPU: 29 PID: 41993 Comm: qemu-kvm Kdump: loaded Tainted: G                 --------- -t - 4.18.0-224.el8.ppc64le #1
[ 7168.655850] NIP:  c00800000f2de06c LR: c00800000f2de06c CTR: c00000000001aba0
[ 7168.655852] REGS: c000003f1e75f748 TRAP: 0100   Tainted: G                 --------- -t -  (4.18.0-224.el8.ppc64le)
[ 7168.655852] MSR:  9000000102803033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28002482  XER: 00000000
[ 7168.655859] CFAR: c000000001c8a178 IRQMASK: 40000003d6f41f 
[ 7168.655860] GPR00: c00800000f2de06c c000003f1e75f8b0 c000000001ac0c00 c000003f1e75f748 
[ 7168.655863] GPR04: c009dfc6696fd218 0000000000000000 0000000000000000 000000000000001d 
[ 7168.655866] GPR08: 0000000000000003 0000000000000000 0000003ffcf20000 c00800000f2f2500 
[ 7168.655869] GPR12: c00000000001aba0 c000003ffffdf480 00007fffa8ff0000 00007ff0ce7d0000 
[ 7168.655871] GPR16: 00007fffa7e84410 c000003e0542a578 0000000000000001 c000000001345d10 
[ 7168.655874] GPR20: 000000000000001d c000000001c8a178 c000003e0542a578 000000000000001d 
[ 7168.655876] GPR24: 000000000000001d ffffffffffffffff 0040000003d6f41f 0000000000000100 
[ 7168.655879] GPR28: c000003e05420000 c000003d27a70000 c000003fde0efa00 c000003e5ab6a6a0 
[ 7168.655882] NIP [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.655883] LR [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv]
[ 7168.655884] Call Trace:
[ 7168.655884] [c000003f1e75f8b0] [c00800000f2de06c] kvmhv_run_single_vcpu+0x7a4/0xda0 [kvm_hv] (unreliable)
[ 7168.655886] [c000003f1e75f980] [c00800000f2defe8] kvmppc_vcpu_run_hv+0x980/0x1060 [kvm_hv]
[ 7168.655888] [c000003f1e75fa90] [c00800000f4bebbc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[ 7168.655889] [c000003f1e75fab0] [c00800000f4ba36c] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
[ 7168.655890] [c000003f1e75fba0] [c00800000f4a33b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
[ 7168.655892] [c000003f1e75fd10] [c000000000546950] do_vfs_ioctl+0xe0/0xaa0
[ 7168.655893] [c000003f1e75fde0] [c0000000005474e4] sys_ioctl+0xc4/0x160
[ 7168.655894] [c000003f1e75fe30] [c00000000000b408] system_call+0x5c/0x70
[ 7168.655895] Instruction dump:
[ 7168.655896] 614af804 7fa95000 409efcf8 7fe3fb78 48014d75 e8410018 4bfffce8 60000000 
[ 7168.655900] 60000000 2f9b0100 409efbfc 48014499 <e8410018> 4bfffbf0 60000000 

Expected results:
Hot unplug vcpus smoothly.

Additional info:

Comment 1 Qunfang Zhang 2020-07-14 03:32:01 UTC

Hi Xujun,

Can you confirm the regression happens from which version?

Thanks,
Qunfang

Comment 2 Michael Roth 2020-07-22 00:43:22 UTC

I'm unable to reproduce this with kernel-4.18.0-227.el8.ppc64le in both host and guest, and qemu-kvm 5.0, though I haven't identified any related fixes that have gone in.

I tried the original workload ~15 times, and then a sped up workload which I ran for an hour which was ~30 iterations of the original test loop.

I also tried it with echo 600000 >/proc/sys/vm/percpu_pagelist_fraction in the guest to in hopes that it would trigger the free_pcppages_bulk() call more frequently

Can we confirm the 1/5 reproducible of this, or whether it is still reproducible? Are there other factors worth noting that might make a difference with reproducibility?

Also, what machine configuration was this run on? I did the tests on a Boston system with 128 CPUs and 64GB of memory

Comment 3 Xujun Ma 2020-07-23 02:29:50 UTC

(In reply to Michael Roth from comment #2)
> I'm unable to reproduce this with kernel-4.18.0-227.el8.ppc64le in both host
> and guest, and qemu-kvm 5.0, though I haven't identified any related fixes
> that have gone in.
> 
> I tried the original workload ~15 times, and then a sped up workload which I
> ran for an hour which was ~30 iterations of the original test loop.
> 
> I also tried it with echo 600000 >/proc/sys/vm/percpu_pagelist_fraction in
> the guest to in hopes that it would trigger the free_pcppages_bulk() call
> more frequently
> 
> Can we confirm the 1/5 reproducible of this, or whether it is still
> reproducible? Are there other factors worth noting that might make a
> difference with reproducibility?
I'm not sure. From my side, it's easy to reproduce. Did you use threads=1?
> 
> Also, what machine configuration was this run on? I did the tests on a
> Boston system with 128 CPUs and 64GB of memory
Power9 host.

Comment 4 Xujun Ma 2020-07-23 02:34:01 UTC

(In reply to Qunfang Zhang from comment #1)
> Hi Xujun,
> 
> Can you confirm the regression happens from which version?
> 
> Thanks,
> Qunfang

I tested the first fast train build and latest slow train build.
All can reproduce this problem.

qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+1f7d6182
qemu-kvm-4.2.0-30.module+el8.3.0+7298+c26a06b8

Comment 5 David Gibson 2020-07-23 03:14:59 UTC

The host lockups here look like bug 1733467.  I think we'll have to sort out before seeing if there's any remaining bug here.

Comment 6 Qunfang Zhang 2020-07-23 03:23:53 UTC

Remove Regression keyword based on comment 4.

Comment 7 David Gibson 2020-07-23 03:27:16 UTC

Xujun,

Roughly how long did this take to reproduce for you?

I'm wondering if we might have a better way of triggering bug 1733467 here.

Comment 8 Xujun Ma 2020-07-23 06:49:38 UTC

(In reply to David Gibson from comment #7)
> Xujun,
> 
> Roughly how long did this take to reproduce for you?
> 
> I'm wondering if we might have a better way of triggering bug 1733467 here.

It's easy to reproduce before on boston and withersppon machine, I just did several loops.
But I can't reproduce it now on a boston machine. I'm finding what problem might be.

Comment 9 Michael Roth 2020-07-23 23:12:13 UTC

(In reply to Xujun Ma from comment #4)
> (In reply to Qunfang Zhang from comment #1)
> > Hi Xujun,
> > 
> > Can you confirm the regression happens from which version?
> > 
> > Thanks,
> > Qunfang
> 
> I tested the first fast train build and latest slow train build.
> All can reproduce this problem.
> 
> qemu-kvm-5.0.0-0.scrmod+el8.3.0+6312+1f7d6182
> qemu-kvm-4.2.0-30.module+el8.3.0+7298+c26a06b8

can you confirm what kernel was used in this newer case? would help in determining if this might have been fixed downstream or not.

Comment 10 Xujun Ma 2020-07-26 15:22:04 UTC

I reproduced this bug again with in-kernel type irqchip in guest on a witherspoon host. I think it might be related to irqchip type.

Comment 11 David Gibson 2020-07-27 06:38:47 UTC

I have also reproduced the guest crash on Witherspoon, and the host lockups did *not* appear.  So I think the two problems are unrelated, and that host happened to have also hit bug 1733467 triggered by something else on the system.

Mike, sounds like this only happens with in-kernel irqchip (which requires Witherspoon), we now have a Witherspoon machine (ibm-p9wr-09.pnr.lab.eng.bos.redhat.com) so can you continue investigating there please.

Comment 12 Michael Roth 2020-07-27 15:38:20 UTC

FWIW I ran the workload on 4 guests overnight without in-kernel irqchip using the original 4.18.0-224.el8.ppc64le kernel on a Boston system and did not encounter the issue, so in-kernel irqchip does seem likely to be a requirement. Will attempt to reproduce on a witherspoon.

Comment 13 Michael Roth 2020-07-29 01:14:57 UTC

I reproduced a related trace on a mihawk system using kernel 221. will attempt to reproduce with 224 as with original report, but assuming the code hasn't changed much I think the above traces are from the cpu tearing itself down via pseries_mach_cpu_die, whereas the trace below is from a device_offline(cpu) call handled by the hotplug worker thread. The below case calls free_pcppages_bulk() via generic page_alloc_cpu_dead() callback instead of free_pcppages_bulk->free_unref_page->xive_spapr_cleanup_queue

The presence of the "cpu 312 (hwid 312) Ready to die" suggests the dying cpu has already finished cleanup, but perhaps it executed free_pcppages_bulk while the hotplug worker was still in the middle of free_pcppages_bulk that's what resulted in the double add? i'm not sure what the locking protocol is there.

This doesn't really explain why we don't see it without in-kernel irqchip, but if it's racy then it could just be a timing thing.


[  853.755706] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000137
[  853.907647] IRQ 19: no longer affine to CPU311
[  853.907723] cpu 311 (hwid 311) Ready to die...
[  853.965435] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000137
[  854.771062] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000138
[  854.879878] Querying DEAD? cpu 312 (312) shows 2
[  854.879971] list_add double add: new=c00a00000245ee48, prev=c0000009feb5ca80, next=c00a00000245ee48.
[  854.879974] cpu 312 (hwid 312) Ready to die...
[  854.880040] ------------[ cut here ]------------
[  854.880796] kernel BUG at lib/list_debug.c:31!
[  854.880829] Oops: Exception in kernel mode, sig: 5 [#1]
[  854.880860] LE SMP NR_CPUS=2048 NUMA pSeries
[  854.880892] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nft_chain_route_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod
[  854.881254] CPU: 334 PID: 7 Comm: kworker/u768:0 Kdump: loaded Not tainted 4.18.0-193.10.el8.bz1730194.ppc64le #1
[  854.881318] Workqueue: pseries hotplug workque pseries_hp_work_fn
[  854.881357] NIP:  c000000000784b20 LR: c000000000784b1c CTR: c000000007dfb280
[  854.881402] REGS: c0000009e584b3d0 TRAP: 0700   Not tainted  (4.18.0-193.10.el8.bz1730194.ppc64le)
[  854.881455] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48082228  XER: 20040000
[  854.881510] CFAR: c0000000001f9c24 IRQMASK: 1 
[  854.881510] GPR00: c000000000784b1c c0000009e584b650 c000000001954a00 0000000000000058 
[  854.881510] GPR04: 0000000071000001 0000000000000009 3435656534382e0d 0a61303030303032 
[  854.881510] GPR08: 0000000000000007 0000000000000007 0000000000000001 3562656639303030 
[  854.881510] GPR12: 0000000000002000 c000000007dfb280 c00a00000245ee48 0000000000000000 
[  854.881510] GPR16: c00a00000245ee48 c00a00000245ee40 00000000000917b8 c0000009feb5c980 
[  854.881510] GPR20: 0000000000000000 5deadbeef0000100 c0000009feb5ca80 c0000000019935d0 
[  854.881510] GPR24: c0000009e584b708 c000000001b02f08 fffffffff0000000 0000000000000001 
[  854.881510] GPR28: c00a000002469b00 0000000000000000 c00a00000245ee40 0000000000000100 
[  854.881887] NIP [c000000000784b20] __list_add_valid+0x90/0xc0
[  854.881926] LR [c000000000784b1c] __list_add_valid+0x8c/0xc0
[  854.881963] Call Trace:
[  854.881980] [c0000009e584b650] [c000000000784b1c] __list_add_valid+0x8c/0xc0 (unreliable)
[  854.882027] [c0000009e584b6b0] [c00000000045d3c0] free_pcppages_bulk+0x460/0x940
[  854.882073] [c0000009e584b7b0] [c00000000045dc98] page_alloc_cpu_dead+0x118/0x120
[  854.882119] [c0000009e584b800] [c000000000154fa8] cpuhp_invoke_callback.constprop.4+0xb8/0x760
[  854.882173] [c0000009e584b870] [c0000000001585d8] _cpu_down+0x188/0x380
[  854.882211] [c0000009e584b8e0] [c000000000156e1c] cpu_down+0x5c/0xa0
[  854.882251] [c0000009e584b910] [c0000000008ff7c4] cpu_subsys_offline+0x24/0x40
[  854.882296] [c0000009e584b930] [c0000000008f44e0] device_offline+0xf0/0x130
[  854.882335] [c0000009e584b970] [c0000000001080a4] dlpar_offline_cpu+0x1c4/0x2a0
[  854.882382] [c0000009e584ba30] [c0000000001084b8] dlpar_cpu_remove+0xb8/0x190
[  854.882428] [c0000009e584bab0] [c0000000001086bc] dlpar_cpu_remove_by_index+0x12c/0x150
[  854.882474] [c0000009e584bb40] [c000000000109a74] dlpar_cpu+0x94/0x600
[  854.882512] [c0000009e584bc00] [c0000000000fffe8] pseries_hp_work_fn+0x128/0x1e0
[  854.882559] [c0000009e584bc70] [c000000000183ed4] process_one_work+0x304/0x5d0
[  854.882605] [c0000009e584bd10] [c000000000184a1c] worker_thread+0xcc/0x7a0
[  854.882644] [c0000009e584bdc0] [c00000000018e9bc] kthread+0x1ac/0x1c0
[  854.882683] [c0000009e584be30] [c00000000000b7dc] ret_from_kernel_thread+0x5c/0x80
[  854.882728] Instruction dump:
[  854.882752] 4ba750ed 60000000 0fe00000 7c0802a6 7c641b78 3c62ff74 7d465378 7d054378 
[  854.882799] 3863f528 f8010070 4ba750c5 60000000 <0fe00000> 7c0802a6 3c62ff74 7d264b78 
[  854.882849] ---[ end trace 8fef7a65bb8c7def ]---
[  854.888093]                                                                                                                                                                                                       
[  854.888226] Sending IPI to other CPUs

Comment 14 Michael Roth 2020-07-29 01:50:41 UTC

slightly different trace on above mihawk system with originally reported 4.18.0-224.el8 kernel

[ 1537.419315] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000139
[ 1538.253044] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000013a
[ 1538.360259] Querying DEAD? cpu 314 (314) shows 2
[ 1538.360736] BUG: Bad page state in process kworker/u768:3  pfn:95de1
[ 1538.360746] cpu 314 (hwid 314) Ready to die...
[ 1538.360784] page:c00a000002577840 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
[ 1538.361881] flags: 0x5ffffc00000000()
[ 1538.361908] raw: 005ffffc00000000 5deadbeef0000100 5deadbeef0000200 0000000000000000
[ 1538.361955] raw: 0000000000000000 0000000000000000 00000000ffffff7f 0000000000000000
[ 1538.362002] page dumped because: nonzero mapcount
[ 1538.362033] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod
[ 1538.362613] CPU: 0 PID: 548 Comm: kworker/u768:3 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
[ 1538.362687] Workqueue: pseries hotplug workque pseries_hp_work_fn
[ 1538.362725] Call Trace:
[ 1538.362743] [c0000009d4adf590] [c000000000e0e0fc] dump_stack+0xb0/0xf4 (unreliable)
[ 1538.362789] [c0000009d4adf5d0] [c000000000475dfc] bad_page+0x12c/0x1b0
[ 1538.362827] [c0000009d4adf660] [c0000000004784bc] free_pcppages_bulk+0x5bc/0x940
[ 1538.362871] [c0000009d4adf760] [c000000000478c38] page_alloc_cpu_dead+0x118/0x120
[ 1538.362918] [c0000009d4adf7b0] [c00000000015b898] cpuhp_invoke_callback.constprop.5+0xb8/0x760
[ 1538.362969] [c0000009d4adf820] [c00000000015eee8] _cpu_down+0x188/0x340
[ 1538.363007] [c0000009d4adf890] [c00000000015d75c] cpu_down+0x5c/0xa0
[ 1538.363045] [c0000009d4adf8c0] [c00000000092c544] cpu_subsys_offline+0x24/0x40
[ 1538.363091] [c0000009d4adf8e0] [c0000000009212f0] device_offline+0xf0/0x130
[ 1538.363129] [c0000009d4adf920] [c00000000010aee4] dlpar_offline_cpu+0x1c4/0x2a0
[ 1538.363174] [c0000009d4adf9e0] [c00000000010b2f8] dlpar_cpu_remove+0xb8/0x190
[ 1538.363219] [c0000009d4adfa60] [c00000000010b4fc] dlpar_cpu_remove_by_index+0x12c/0x150
[ 1538.363264] [c0000009d4adfaf0] [c00000000010ca24] dlpar_cpu+0x94/0x800
[ 1538.363302] [c0000009d4adfc00] [c000000000102cc8] pseries_hp_work_fn+0x128/0x1e0
[ 1538.363347] [c0000009d4adfc70] [c00000000018aa84] process_one_work+0x304/0x5d0
[ 1538.363394] [c0000009d4adfd10] [c00000000018b5cc] worker_thread+0xcc/0x7a0
[ 1538.363433] [c0000009d4adfdc0] [c00000000019567c] kthread+0x1ac/0x1c0
[ 1538.363469] [c0000009d4adfe30] [c00000000000b7dc] ret_from_kernel_thread+0x5c/0x80

Comment 15 Greg Kurz 2020-07-29 10:47:53 UTC

I'm trying to reproduce on a witherspoon system with a fedora rawhide guest. No crash yet but I got this:

[  183.796986] cpu 34 (hwid 34) Ready to die...
[  184.677349] 
[  184.677384] =============================
[  184.677385] WARNING: suspicious RCU usage
[  184.677388] 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 Not tainted
[  184.677389] -----------------------------
[  184.677391] kernel/sched/core.c:567 suspicious rcu_dereference_check() usage!
[  184.677392] 
[  184.677392] other info that might help us debug this:
[  184.677392] 
[  184.677394] 
[  184.677394] RCU used illegally from offline CPU!
[  184.677394] rcu_scheduler_active = 2, debug_locks = 1
[  184.677397] 2 locks held by swapper/35/0:
[  184.677398]  #0: c0000009c4d0e518 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x1f4/0x7e0
[  184.677408]  #1: c00000000240ba78 (rcu_read_lock){....}-{1:2}, at: get_nohz_timer_target+0xc8/0x3e0
[  184.677413] 
[  184.677413] stack backtrace:
[  184.677416] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1
[  184.677419] Call Trace:
[  184.677422] [c0000009bda47890] [c000000000b85d7c] dump_stack+0x100/0x174 (unreliable)
[  184.677426] [c0000009bda478f0] [c000000000246204] lockdep_rcu_suspicious+0x13c/0x15c
[  184.677428] [c0000009bda47980] [c0000000001ddc8c] get_nohz_timer_target+0x34c/0x3e0
[  184.677431] [c0000009bda479f0] [c0000000002a1684] __mod_timer+0x514/0x7e0
[  184.677435] [c0000009bda47a80] [c0000000001b8a50] queue_delayed_work_on+0x180/0x210
[  184.677437] [c0000009bda47ad0] [c000000000b16620] debug_check_no_obj_freed+0x270/0x320
[  184.677440] [c0000009bda47b90] [c000000000543584] free_pcp_prepare+0x124/0x470
[  184.677443] [c0000009bda47bd0] [c000000000549208] free_unref_page+0x38/0x150
[  184.677446] [c0000009bda47c10] [c0000000000dc8b8] xive_spapr_cleanup_queue+0x128/0x1e0
[  184.677449] [c0000009bda47ca0] [c0000000000d9e14] xive_teardown_cpu+0x1a4/0x220
[  184.677452] [c0000009bda47ce0] [c00000000012f0b4] pseries_mach_cpu_die+0x84/0x350
[  184.677454] [c0000009bda47d90] [c000000000062b84] cpu_die+0x44/0x60
[  184.677457] [c0000009bda47db0] [c00000000001f2d0] arch_cpu_idle_dead+0x30/0x50
[  184.677460] [c0000009bda47dd0] [c0000000001ed524] do_idle+0x314/0x540
[  184.677462] [c0000009bda47ea0] [c0000000001edb1c] cpu_startup_entry+0x3c/0x40
[  184.677465] [c0000009bda47ed0] [c00000000006237c] start_secondary+0x7ac/0xb70
[  184.677467] [c0000009bda47f90] [c00000000000c654] start_secondary_prolog+0x10/0x14

Comment 16 Michael Roth 2020-07-29 15:40:50 UTC

(In reply to Greg Kurz from comment #15)
> I'm trying to reproduce on a witherspoon system with a fedora rawhide guest.
> No crash yet but I got this:
> 
> [  183.796986] cpu 34 (hwid 34) Ready to die...
> [  184.677349] 
> [  184.677384] =============================
> [  184.677385] WARNING: suspicious RCU usage
> [  184.677388] 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1 Not
> tainted
> [  184.677389] -----------------------------
> [  184.677391] kernel/sched/core.c:567 suspicious rcu_dereference_check()
> usage!
> [  184.677392] 
> [  184.677392] other info that might help us debug this:
> [  184.677392] 
> [  184.677394] 
> [  184.677394] RCU used illegally from offline CPU!
> [  184.677394] rcu_scheduler_active = 2, debug_locks = 1
> [  184.677397] 2 locks held by swapper/35/0:
> [  184.677398]  #0: c0000009c4d0e518 (&base->lock){-.-.}-{2:2}, at:
> __mod_timer+0x1f4/0x7e0
> [  184.677408]  #1: c00000000240ba78 (rcu_read_lock){....}-{1:2}, at:
> get_nohz_timer_target+0xc8/0x3e0
> [  184.677413] 
> [  184.677413] stack backtrace:
> [  184.677416] CPU: 35 PID: 0 Comm: swapper/35 Not tainted
> 5.8.0-0.rc5.20200715gite9919e11e219.1.fc33.ppc64le #1
> [  184.677419] Call Trace:
> [  184.677422] [c0000009bda47890] [c000000000b85d7c] dump_stack+0x100/0x174
> (unreliable)
> [  184.677426] [c0000009bda478f0] [c000000000246204]
> lockdep_rcu_suspicious+0x13c/0x15c
> [  184.677428] [c0000009bda47980] [c0000000001ddc8c]
> get_nohz_timer_target+0x34c/0x3e0
> [  184.677431] [c0000009bda479f0] [c0000000002a1684] __mod_timer+0x514/0x7e0
> [  184.677435] [c0000009bda47a80] [c0000000001b8a50]
> queue_delayed_work_on+0x180/0x210
> [  184.677437] [c0000009bda47ad0] [c000000000b16620]
> debug_check_no_obj_freed+0x270/0x320
> [  184.677440] [c0000009bda47b90] [c000000000543584]
> free_pcp_prepare+0x124/0x470
> [  184.677443] [c0000009bda47bd0] [c000000000549208]
> free_unref_page+0x38/0x150
> [  184.677446] [c0000009bda47c10] [c0000000000dc8b8]
> xive_spapr_cleanup_queue+0x128/0x1e0
> [  184.677449] [c0000009bda47ca0] [c0000000000d9e14]
> xive_teardown_cpu+0x1a4/0x220
> [  184.677452] [c0000009bda47ce0] [c00000000012f0b4]
> pseries_mach_cpu_die+0x84/0x350
> [  184.677454] [c0000009bda47d90] [c000000000062b84] cpu_die+0x44/0x60
> [  184.677457] [c0000009bda47db0] [c00000000001f2d0]
> arch_cpu_idle_dead+0x30/0x50
> [  184.677460] [c0000009bda47dd0] [c0000000001ed524] do_idle+0x314/0x540
> [  184.677462] [c0000009bda47ea0] [c0000000001edb1c]
> cpu_startup_entry+0x3c/0x40
> [  184.677465] [c0000009bda47ed0] [c00000000006237c]
> start_secondary+0x7ac/0xb70
> [  184.677467] [c0000009bda47f90] [c00000000000c654]
> start_secondary_prolog+0x10/0x14

did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace?

i reproduced again on mihawk (trace below), this time I got the trace on the unplugged cpu rather than the hotplug worker thread. It seems like what's happening is the hotplug worker calls device_offline() on the target CPU, which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die():

static void pseries_cpu_die(unsigned int cpu)
{
        int tries;
        int cpu_status = 1;
        unsigned int pcpu = get_hard_smp_processor_id(cpu);

        if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
                cpu_status = 1;
                for (tries = 0; tries < 5000; tries++) {
                        if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
                                cpu_status = 0;
                                break;
                        }
                        msleep(1);
                }
        } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {

                for (tries = 0; tries < 25; tries++) {
                        cpu_status = smp_query_cpu_stopped(pcpu);
                        if (cpu_status == QCSS_STOPPED ||
                            cpu_status == QCSS_HARDWARE_ERROR)
                                break;
                        cpu_relax();
                }
        }

        if (cpu_status != 0) {
                printk("Querying DEAD? cpu %i (%i) shows %i\n",
                       cpu, pcpu, cpu_status);
        }

        /* Isolation and deallocation are definitely done by
         * drslot_chrp_cpu.  If they were not they would be
         * done here.  Change isolate state to Isolate and
         * change allocation-state to Unusable.
         */
        paca_ptrs[cpu]->cpu_start = 0;
}

The function is supposed to wait for the CPU to finish pseries_mach_cpu_die, but it only tries to wait for some limited amount of time before it gives up waiting and just prints, e.g.:

  Querying DEAD? cpu 314 (314) shows 2

2 in this case meaning QCSS_NOT_STOPPED.

So back in the worker thread we continue on with other cleanup, including the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which generates some of the traces I posted earlier.

Drain pages has the following assumptions:

  /*
   * Drain pcplists of all zones on the indicated processor.
   *
   * The processor must either be the current processor and the
   * thread pinned to the current processor or a processor that
   * is not online.
   */
  static void drain_pages(unsigned int cpu)
  {
          struct zone *zone;

          for_each_populated_zone(zone) {
                  drain_pages_zone(cpu, zone);
          }
  }

so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is already in the middle of it), the per-cpu page cleanup might race with the worker thread calling drain_pages().

I'm not sure if we can expect to see this behavior if we don't see the "Querying DEAD?" messages though, since that suggests the unplugged CPU called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in pseries_mach_cpu_die() already, so it had already finished the page cleanup. maybe there's some other path though.


If that's what's going on though, the question I guess is why is it okay for pseries_cpu_die() to give up on waiting for the CPU to go offline? If we can't risk hanging the worker thread with an indefinite wait maybe we need a longer wait time at least.



[ 4463.745739] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 1000005a
[ 4464.318516] Querying DEAD? cpu 90 (90) shows 2
[ 4464.318935] list_del corruption. prev->next should be c00a00000251da48, but was c0000009f167bb10
[ 4464.318955] ------------[ cut here ]------------
[ 4464.318956] kernel BUG at lib/list_debug.c:53!
[ 4464.318959] Oops: Exception in kernel mode, sig: 5 [#1]
[ 4464.318960] LE SMP NR_CPUS=2048 NUMA pSeries
[ 4464.318962] Modules linked in: kvm xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink sunrpc xts vmx_crypto ip_tables xfs libcrc32c sd_mod sg virtio_net net_failover virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod
[ 4464.318988] CPU: 90 PID: 0 Comm: swapper/90 Kdump: loaded Not tainted 4.18.0-224.el8.bz1856588.ppc64le #1
[ 4464.318990] NIP:  c0000000007acc70 LR: c0000000007acc6c CTR: 0000000000000000
[ 4464.318991] REGS: c0000009e5b7b840 TRAP: 0700   Not tainted  (4.18.0-224.el8.bz1856588.ppc64le)
[ 4464.318992] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 22000842  XER: 20040000
[ 4464.318996] CFAR: c000000000200e24 IRQMASK: 1
[ 4464.318996] GPR00: c0000000007acc6c c0000009e5b7bac0 c000000001ac0c00 0000000000000054
[ 4464.318996] GPR04: c0000009f15ecf90 c0000009f1675628 c0000009ed4d7f40 c000000001c90c00
[ 4464.318996] GPR08: 0000000000000007 c0000009f15f3980 00000009f0330000 7562202c38346164
[ 4464.318996] GPR12: 0000000000000000 c000000007f69280 c00a00000251da48 0000000000000000
[ 4464.318996] GPR16: 0000000000000001 c0000009f167bb00 0000000000000800 c0000009ff3d4980
[ 4464.318996] GPR20: c0000009f167bb10 5deadbeef0000100 5deadbeef0000200 0000000000187961
[ 4464.318996] GPR24: c0000009e5b7bb78 0000000000000000 0000000000000001 ffffffffffffffff
[ 4464.318996] GPR28: c00a00000251da40 c0000009f167bb10 c0000009f167bb10 0000000000000000
[ 4464.319013] NIP [c0000000007acc70] __list_del_entry_valid+0xd0/0x100
[ 4464.319015] LR [c0000000007acc6c] __list_del_entry_valid+0xcc/0x100
[ 4464.319015] Call Trace:
[ 4464.319017] [c0000009e5b7bac0] [c0000000007acc6c] __list_del_entry_valid+0xcc/0x100 (unreliable)
[ 4464.319020] [c0000009e5b7bb20] [c0000000004780f8] free_pcppages_bulk+0x1f8/0x940
[ 4464.319022] [c0000009e5b7bc20] [c00000000047bc80] free_unref_page+0xd0/0x100
[ 4464.319025] [c0000009e5b7bc50] [c0000000000bbfb8] xive_spapr_cleanup_queue+0x158/0x200
[ 4464.319028] [c0000009e5b7bcf0] [c0000000000b93dc] xive_teardown_cpu+0x1bc/0x240
[ 4464.319030] [c0000009e5b7bd30] [c00000000010bcb8] pseries_mach_cpu_die+0x78/0x2f0
[ 4464.319032] [c0000009e5b7bde0] [c00000000005c8f8] cpu_die+0x48/0x70
[ 4464.319034] [c0000009e5b7be00] [c000000000021cf0] arch_cpu_idle_dead+0x20/0x40
[ 4464.319036] [c0000009e5b7be20] [c0000000001b5254] do_idle+0x2f4/0x4c0
[ 4464.319038] [c0000009e5b7bea0] [c0000000001b566c] cpu_startup_entry+0x3c/0x40
[ 4464.319039] [c0000009e5b7bed0] [c00000000005c45c] start_secondary+0x7bc/0x8f0
[ 4464.319041] [c0000009e5b7bf90] [c00000000000ac70] start_secondary_prolog+0x10/0x14

Comment 17 Greg Kurz 2020-07-29 16:51:10 UTC

(In reply to Michael Roth from comment #16)
> 
> did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace?
> 

Ah... I should have kept more context and these "Querying DEAD?" lines pop
up so often that I don't remember :-\ I'll try again.

> i reproduced again on mihawk (trace below), this time I got the trace on the
> unplugged cpu rather than the hotplug worker thread. It seems like what's
> happening is the hotplug worker calls device_offline() on the target CPU,
> which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die():
> 

Yes.

> 
> The function is supposed to wait for the CPU to finish pseries_mach_cpu_die,
> but it only tries to wait for some limited amount of time before it gives up
> waiting and just prints, e.g.:
> 
>   Querying DEAD? cpu 314 (314) shows 2
> 
> 2 in this case meaning QCSS_NOT_STOPPED.
> 
> So back in the worker thread we continue on with other cleanup, including
> the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which
> generates some of the traces I posted earlier.
> 
> Drain pages has the following assumptions:
> 
>   /*
>    * Drain pcplists of all zones on the indicated processor.
>    *
>    * The processor must either be the current processor and the
>    * thread pinned to the current processor or a processor that
>    * is not online.
>    */

If the queried state was QCSS_NOT_STOPPED then it is possible that
the processor is still online, right ?

>   static void drain_pages(unsigned int cpu)
>   {
>           struct zone *zone;
> 
>           for_each_populated_zone(zone) {
>                   drain_pages_zone(cpu, zone);
>           }
>   }
> 
> so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is
> already in the middle of it), the per-cpu page cleanup might race with the
> worker thread calling drain_pages().
> 

Ugly :-\

> I'm not sure if we can expect to see this behavior if we don't see the
> "Querying DEAD?" messages though, since that suggests the unplugged CPU
> called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in
> pseries_mach_cpu_die() already, so it had already finished the page cleanup.
> maybe there's some other path though.
> 
> 
> If that's what's going on though, the question I guess is why is it okay for
> pseries_cpu_die() to give up on waiting for the CPU to go offline? If we
> can't risk hanging the worker thread with an indefinite wait maybe we need a
> longer wait time at least.
> 

This sleep based synchronization looks like a recipe for disaster... waiting
longer might just hide the problem until next time. But yes, your questioning
is clearly good: why does the worker thread give up in the first place ? I
mean, if the unplugged CPU never completes pseries_mach_cpu_die() then something
went very wrong and I don't quite see the point in continuing execution.

FYI, I could reproduce with a rhel83 guest and I was hoping to capture a vmcore
but unfortunately kdump didn't work because of some other problem that is supposed
to be fixed in the latest kernel... and later on I hit the host hard lockup issue
of BZ #1733467 :-) I'll try again tomorrow.

Comment 18 Michael Roth 2020-07-29 22:57:51 UTC

(In reply to Greg Kurz from comment #17)
> (In reply to Michael Roth from comment #16)
> > 
> > did you see "Querying DEAD? cpu 34 (34) shows 2" earlier in the trace?
> > 
> 
> Ah... I should have kept more context and these "Querying DEAD?" lines pop
> up so often that I don't remember :-\ I'll try again.
> 
> > i reproduced again on mihawk (trace below), this time I got the trace on the
> > unplugged cpu rather than the hotplug worker thread. It seems like what's
> > happening is the hotplug worker calls device_offline() on the target CPU,
> > which at some point calls takedown_cpu()->__cpu_die()->pseries_cpu_die():
> > 
> 
> Yes.
> 
> > 
> > The function is supposed to wait for the CPU to finish pseries_mach_cpu_die,
> > but it only tries to wait for some limited amount of time before it gives up
> > waiting and just prints, e.g.:
> > 
> >   Querying DEAD? cpu 314 (314) shows 2
> > 
> > 2 in this case meaning QCSS_NOT_STOPPED.
> > 
> > So back in the worker thread we continue on with other cleanup, including
> > the call to page_alloc_cpu_dead()->drain_pages()->free_pcppages_bulk() which
> > generates some of the traces I posted earlier.
> > 
> > Drain pages has the following assumptions:
> > 
> >   /*
> >    * Drain pcplists of all zones on the indicated processor.
> >    *
> >    * The processor must either be the current processor and the
> >    * thread pinned to the current processor or a processor that
> >    * is not online.
> >    */
> 
> If the queried state was QCSS_NOT_STOPPED then it is possible that
> the processor is still online, right ?

It appears so, I added some traces to confirm, and although I haven't been able to reproduce with the traces in place there is some interleaving that seems to violate drain_pages() assumptions that the unplugged cpu is no longer running:

[ 1286.437719] pseries-hotplug-cpu: Attempting to remove CPU <NULL>, drc index: 10000027
[ 1287.004749] xive: xive_teardown_cpu 39: > xive_ops->teardown_cpu
[ 1287.004752] xive: xive_teardown_cpu 39: < xive_ops->teardown_cpu
[ 1287.004753] xive: xive_teardown_cpu 39: > xive_cleanup_cpu_ipi
[ 1287.005439] Querying DEAD? cpu 39 (39) shows 2
[ 1287.005743] drain_pages called for cpu 39
[ 1287.005750] xive: xive_teardown_cpu 39: < xive_cleanup_cpu_ipi
# cpu 39 enters xive_cleanup_cpu_queues while unplug worker thread is still inside drain_pages()
[ 1287.005773] xive: xive_teardown_cpu 39: > xive_cleanup_cpu_queues
[ 1287.005774] drain_pages completed for cpu 39
[ 1287.005806] xive: xive_teardown_cpu 39: < xive_cleanup_cpu_queues
[ 1287.005882] cpu 39 (hwid 39) Ready to die...
[ 1287.098829] pseries-hotplug-cpu: Successfully removed CPU, drc index: 10000027


> 
> >   static void drain_pages(unsigned int cpu)
> >   {
> >           struct zone *zone;
> > 
> >           for_each_populated_zone(zone) {
> >                   drain_pages_zone(cpu, zone);
> >           }
> >   }
> > 
> > so if the unplugged CPU at some point enters pseries_mach_cpu_die() (or is
> > already in the middle of it), the per-cpu page cleanup might race with the
> > worker thread calling drain_pages().
> > 
> 
> Ugly :-\
> 
> > I'm not sure if we can expect to see this behavior if we don't see the
> > "Querying DEAD?" messages though, since that suggests the unplugged CPU
> > called set_cpu_current_state(cpu, CPU_STATE_OFFLINE) in
> > pseries_mach_cpu_die() already, so it had already finished the page cleanup.
> > maybe there's some other path though.
> > 
> > 
> > If that's what's going on though, the question I guess is why is it okay for
> > pseries_cpu_die() to give up on waiting for the CPU to go offline? If we
> > can't risk hanging the worker thread with an indefinite wait maybe we need a
> > longer wait time at least.
> > 
> 
> This sleep based synchronization looks like a recipe for disaster... waiting
> longer might just hide the problem until next time. But yes, your questioning
> is clearly good: why does the worker thread give up in the first place ? I
> mean, if the unplugged CPU never completes pseries_mach_cpu_die() then
> something
> went very wrong and I don't quite see the point in continuing execution.

Agreed, the behavior has been around for nearly a decade it seems, maybe nobody has cared/noticed it since only the xive cleanup stuff is making the issue visible 

> 
> FYI, I could reproduce with a rhel83 guest and I was hoping to capture a
> vmcore
> but unfortunately kdump didn't work because of some other problem that is
> supposed
> to be fixed in the latest kernel... and later on I hit the host hard lockup
> issue
> of BZ #1733467 :-) I'll try again tomorrow.

Might also be worth noting I hit the host lockup after a few hours of trying this workload as well. Maybe having an unplug loop in the workload might help with reproducing that as well.

Comment 19 Greg Kurz 2020-07-30 17:11:57 UTC

(In reply to Michael Roth from comment #18)
> > This sleep based synchronization looks like a recipe for disaster... waiting
> > longer might just hide the problem until next time. But yes, your questioning
> > is clearly good: why does the worker thread give up in the first place ? I
> > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then
> > something
> > went very wrong and I don't quite see the point in continuing execution.
> 
> Agreed, the behavior has been around for nearly a decade it seems, maybe
> nobody has cared/noticed it since only the xive cleanup stuff is making the
> issue visible 
> 

Yeah, maybe someone should send an RFC that patches out all the giving-up-logic
and wait for flames or clarification.

> > 
> > FYI, I could reproduce with a rhel83 guest and I was hoping to capture a
> > vmcore
> > but unfortunately kdump didn't work because of some other problem that is
> > supposed
> > to be fixed in the latest kernel... and later on I hit the host hard lockup
> > issue
> > of BZ #1733467 :-) I'll try again tomorrow.
> 
> Might also be worth noting I hit the host lockup after a few hours of trying
> this workload as well. Maybe having an unplug loop in the workload might
> help with reproducing that as well.

Heh dunno if this is related but David reserved our witherspoon system all day,
so I didn't investigate more. :)

Comment 20 Michael Roth 2020-07-31 00:13:15 UTC

(In reply to Greg Kurz from comment #19)
> (In reply to Michael Roth from comment #18)
> > > This sleep based synchronization looks like a recipe for disaster... waiting
> > > longer might just hide the problem until next time. But yes, your questioning
> > > is clearly good: why does the worker thread give up in the first place ? I
> > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then
> > > something
> > > went very wrong and I don't quite see the point in continuing execution.
> > 
> > Agreed, the behavior has been around for nearly a decade it seems, maybe
> > nobody has cared/noticed it since only the xive cleanup stuff is making the
> > issue visible 
> > 
> 
> Yeah, maybe someone should send an RFC that patches out all the
> giving-up-logic
> and wait for flames or clarification.

Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready :)

Comment 21 Greg Kurz 2020-07-31 11:23:01 UTC

(In reply to Michael Roth from comment #20)
> (In reply to Greg Kurz from comment #19)
> > (In reply to Michael Roth from comment #18)
> > > > This sleep based synchronization looks like a recipe for disaster... waiting
> > > > longer might just hide the problem until next time. But yes, your questioning
> > > > is clearly good: why does the worker thread give up in the first place ? I
> > > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then
> > > > something
> > > > went very wrong and I don't quite see the point in continuing execution.
> > > 
> > > Agreed, the behavior has been around for nearly a decade it seems, maybe
> > > nobody has cared/noticed it since only the xive cleanup stuff is making the
> > > issue visible 
> > > 
> > 
> > Yeah, maybe someone should send an RFC that patches out all the
> > giving-up-logic
> > and wait for flames or clarification.
> 
> Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready
> :)

FWIW, I've made a quick'n'dirty hack to a -229 kernel to do just that and, as expected,
I don't see the Querying DEAD? traces anymore, while I got plenty of them before. I'll
let the test run over the WE.

Comment 22 Greg Kurz 2020-08-03 09:21:28 UTC

(In reply to Greg Kurz from comment #21)
> (In reply to Michael Roth from comment #20)
> > (In reply to Greg Kurz from comment #19)
> > > (In reply to Michael Roth from comment #18)
> > > > > This sleep based synchronization looks like a recipe for disaster... waiting
> > > > > longer might just hide the problem until next time. But yes, your questioning
> > > > > is clearly good: why does the worker thread give up in the first place ? I
> > > > > mean, if the unplugged CPU never completes pseries_mach_cpu_die() then
> > > > > something
> > > > > went very wrong and I don't quite see the point in continuing execution.
> > > > 
> > > > Agreed, the behavior has been around for nearly a decade it seems, maybe
> > > > nobody has cared/noticed it since only the xive cleanup stuff is making the
> > > > issue visible 
> > > > 
> > > 
> > > Yeah, maybe someone should send an RFC that patches out all the
> > > giving-up-logic
> > > and wait for flames or clarification.
> > 
> > Yah, makes sense, I'll work on getting an RFC out and get my flamesuit ready
> > :)
> 
> FWIW, I've made a quick'n'dirty hack to a -229 kernel to do just that and,
> as expected,
> I don't see the Querying DEAD? traces anymore, while I got plenty of them
> before. I'll
> let the test run over the WE.

The Witherspoon system I was using for this was moved to another location
during the WE... it was planned maintenance I just forgot about :P so I'm
re-running the test now.

BTW, I came across an old commit from benh that bumps the cpu die timeout
to solve a race when a dying CPU still has an active timer. It also
mentions the "Querying DEAD?" trace and it seems to make it a prereq for
the issue to happen.

commit 940ce422a367c8e65404a5ef1ff5969527a06410
Author: Benjamin Herrenschmidt <benh.org>
Date:   Sat Jul 31 15:04:15 2010 +1000

    powerpc/pseries: Increase cpu die timeout

I still don't understand why this timeout logic exists but I'm not so sure
now that getting rid of it is the way to go...

Comment 23 Greg Kurz 2020-08-03 13:13:53 UTC

Anyway, this seems to be a guest kernel item. Updating component accordingly.

Also, a recent change, not upstream yet but already in mpe's powerpc/next,
might be worth taken into account because it changes the way CPUs are
offlined.

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=48f6e7f6d948b56489da027bc3284c709b939d28

commit 48f6e7f6d948b56489da027bc3284c709b939d28
Author: Nathan Lynch <nathanl.com>
Date:   Fri Jun 12 00:12:21 2020 -0500

    powerpc/pseries: remove cede offline state for CPUs

Comment 24 Greg Kurz 2020-08-03 13:57:08 UTC

Hi Xujun,

Since this is clearly a guest side issue, does it also happen with a
RHEL7.9 guest ?

Cheers,

--
Greg

Comment 25 Michael Roth 2020-08-04 11:50:54 UTC

would it be possible to make this BZ public to aid in upstream discussion?

I posted a patch using the sleep approach after failing to come up with a better solution:

  https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/215933.html

but apparently this has even more of a history than we realized, and there's already been a previous patch where a sleep was added that never made it in:

  https://lore.kernel.org/linuxppc-dev/1556752043.jyg2z3kgaw.astroid@bobo.none/

Comment 26 Xujun Ma 2020-08-05 02:46:42 UTC

(In reply to Greg Kurz from comment #24)
> Hi Xujun,
> 
> Since this is clearly a guest side issue, does it also happen with a
> RHEL7.9 guest ?
> 
> Cheers,
> 
> --
> Greg

Have tested it, have no this problem so far.

Comment 27 David Gibson 2020-08-17 03:37:02 UTC

I've talked to Michael Ellerman and asked to expedite putting this into his fixes branch.

Comment 31 David Gibson 2020-08-21 00:56:09 UTC

Mike, the fix is now in mpe's tree (801980f6497946048709b9b09771a1729551d705).

Can you please prepare a downstream brew build of the fix ASAP - we'll need to verify this in order to justify the exception request.

Comment 32 Michael Roth 2020-08-21 03:47:36 UTC

(In reply to David Gibson from comment #31)
> Mike, the fix is now in mpe's tree
> (801980f6497946048709b9b09771a1729551d705).
> 
> Can you please prepare a downstream brew build of the fix ASAP - we'll need
> to verify this in order to justify the exception request.

I've kicked off a brew build at:

  http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30871055

it should be ready within a few hours, assuming all goes well.

Comment 33 Michael Roth 2020-08-21 15:52:01 UTC

Previous brew build had an issue, this one built properly and is ready for verification:

  http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108

I'll also attempt to verify today using a witherspoon system.

Comment 34 Michael Roth 2020-08-23 22:04:19 UTC

(In reply to Michael Roth from comment #33)
> Previous brew build had an issue, this one built properly and is ready for
> verification:
> 
>   http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108
> 
> I'll also attempt to verify today using a witherspoon system.

Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream.

Reproduced original failure after 3 iterations with kernel-4.18.0-233. With this patch applied I ran 125 iterations overnight and found no more occurrences of the original issue.

Comment 37 Xujun Ma 2020-08-28 01:33:13 UTC

(In reply to Michael Roth from comment #34)
> (In reply to Michael Roth from comment #33)
> > Previous brew build had an issue, this one built properly and is ready for
> > verification:
> > 
> >   http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108
> > 
> > I'll also attempt to verify today using a witherspoon system.
> 
> Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream.
> 
> Reproduced original failure after 3 iterations with kernel-4.18.0-233. With
> this patch applied I ran 125 iterations overnight and found no more
> occurrences of the original issue.

Hi Michael

Could you provide a new scratch build for QE to test this bug fix?

Comment 38 Michael Roth 2020-08-28 19:02:16 UTC

(In reply to Xujun Ma from comment #37)
> (In reply to Michael Roth from comment #34)
> > (In reply to Michael Roth from comment #33)
> > > Previous brew build had an issue, this one built properly and is ready for
> > > verification:
> > > 
> > >   http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108
> > > 
> > > I'll also attempt to verify today using a witherspoon system.
> > 
> > Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream.
> > 
> > Reproduced original failure after 3 iterations with kernel-4.18.0-233. With
> > this patch applied I ran 125 iterations overnight and found no more
> > occurrences of the original issue.
> 
> Hi Michael
> 
> Could you provide a new scratch build for QE to test this bug fix?

Hi Xujun,

The earlier brew (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108) is based on latest kernel-4.18.0-234.el8 rhel 8.3 kernel. Not sure why the download links aren't clickable there anymore, but the RPMs are still available here:

  http://brew-task-repos.usersys.redhat.com/repos/scratch/mroth/kernel/4.18.0/234.el8.bz1856588/

I've started a new brew just in case, but it should be the same code:

  http://brewweb.devel.redhat.com/brew/taskinfo?taskID=31009311

Comment 39 Xujun Ma 2020-08-31 14:39:50 UTC

(In reply to Michael Roth from comment #38)
> (In reply to Xujun Ma from comment #37)
> > (In reply to Michael Roth from comment #34)
> > > (In reply to Michael Roth from comment #33)
> > > > Previous brew build had an issue, this one built properly and is ready for
> > > > verification:
> > > > 
> > > >   http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108
> > > > 
> > > > I'll also attempt to verify today using a witherspoon system.
> > > 
> > > Backport of 801980f6497946048709b9b09771a1729551d705 submitted downstream.
> > > 
> > > Reproduced original failure after 3 iterations with kernel-4.18.0-233. With
> > > this patch applied I ran 125 iterations overnight and found no more
> > > occurrences of the original issue.
> > 
> > Hi Michael
> > 
> > Could you provide a new scratch build for QE to test this bug fix?
> 
> Hi Xujun,
> 
> The earlier brew
> (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=30889108) is based on
> latest kernel-4.18.0-234.el8 rhel 8.3 kernel. Not sure why the download
> links aren't clickable there anymore, but the RPMs are still available here:
> 
>  
> http://brew-task-repos.usersys.redhat.com/repos/scratch/mroth/kernel/4.18.0/
> 234.el8.bz1856588/
> 
> I've started a new brew just in case, but it should be the same code:
> 
>   http://brewweb.devel.redhat.com/brew/taskinfo?taskID=31009311

I have tested the scratch build above, and didn't hit this bug.
Base the test result, I think the bug has been fixed in this build.

Comment 43 Frantisek Hrbata 2020-09-10 08:17:55 UTC

Patch(es) available on kernel-4.18.0-236.el8

Comment 48 Xujun Ma 2020-09-15 07:30:31 UTC

I have tested this build and didn't hit the same issue.
Base the test reslult,the bug has been fixed in this build,set it to verified.

Comment 52 errata-xmlrpc 2020-11-04 01:24:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4431

Note You need to log in before you can comment on or make changes to this bug.