Hide Forgot
Description of problem: Failed to hotunplug pc-dimm device Version-Release number of selected component (if applicable): host: kernel-4.18.0-252.el8.ppc64le qemu-kvm-4.2.0-35.module+el8.4.0+8705+34397d87.ppc64le guest: kernel-4.18.0-252.el8.ppc64le How reproducible: 100% Steps to Reproduce: 1.Boot guest with command /usr/libexec/qemu-kvm \ -smp 4 \ -m 4G,slots=16,maxmem=20G \ -nodefaults \ -device spapr-vty,id=serial111,chardev=serial \ -chardev socket,path=serial,id=serial,server,nowait,signal=off \ -device virtio-scsi-pci,bus=pci.0 \ -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ -drive file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \ -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ -mon chardev=monitor,mode=readline -chardev socket,path=monitor,id=monitor,server,nowait,signal=off \ 2.Run hotplug and unplug command for a loop echo "object_add memory-backend-ram,id=mem1,size=4G"|nc -U monitor; for((i=1;i<=50;i++));do echo "device_add pc-dimm,id=dimm1,memdev=mem1,node=0" |nc -U monitor;sleep 5;echo "device_del dimm1" |nc -U monitor;sleep 5;done 3. Actual results: Failed to hotunplug pc-dimm device in the 15 loop. QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 Duplicate ID 'dimm1' for device Try "help device_add" for more information (qemu) QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_del dimm1 Error: Memory unplug already in progress for device dimm1 dmesg in the guest: [ 252.911762] pseries-hotplug-mem: Attempting to hot-remove 16 LMB(s) at 80000010 [ 253.020976] Offlined Pages 4096 [ 253.055256] Offlined Pages 4096 [ 253.085625] Offlined Pages 4096 [ 253.110270] Offlined Pages 4096 [ 253.132497] Offlined Pages 4096 [ 253.160455] Offlined Pages 4096 [ 253.223452] Offlined Pages 4096 [ 253.251284] Offlined Pages 4096 [ 253.281638] Offlined Pages 4096 [ 253.308033] Offlined Pages 4096 [ 253.368658] Offlined Pages 4096 [ 253.390495] Offlined Pages 4096 [ 253.410955] Offlined Pages 4096 [ 253.431035] Offlined Pages 4096 [ 253.547874] Offlined Pages 4096 [ 253.571757] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs [ 253.574060] radix-mmu: Mapped 0x0000000100000000-0x0000000110000000 with 2.00 MiB pages [ 253.580330] radix-mmu: Mapped 0x0000000110000000-0x0000000120000000 with 2.00 MiB pages [ 253.588992] radix-mmu: Mapped 0x0000000120000000-0x0000000130000000 with 2.00 MiB pages [ 253.602504] radix-mmu: Mapped 0x0000000130000000-0x0000000140000000 with 2.00 MiB pages [ 253.617876] radix-mmu: Mapped 0x0000000140000000-0x0000000150000000 with 2.00 MiB pages [ 253.634560] radix-mmu: Mapped 0x0000000150000000-0x0000000160000000 with 2.00 MiB pages [ 253.642112] radix-mmu: Mapped 0x0000000160000000-0x0000000170000000 with 2.00 MiB pages [ 253.662792] radix-mmu: Mapped 0x0000000170000000-0x0000000180000000 with 2.00 MiB pages [ 253.687809] radix-mmu: Mapped 0x0000000180000000-0x0000000190000000 with 2.00 MiB pages [ 253.707920] radix-mmu: Mapped 0x0000000190000000-0x00000001a0000000 with 2.00 MiB pages [ 253.718375] radix-mmu: Mapped 0x00000001a0000000-0x00000001b0000000 with 2.00 MiB pages [ 253.744106] radix-mmu: Mapped 0x00000001b0000000-0x00000001c0000000 with 2.00 MiB pages [ 253.765718] radix-mmu: Mapped 0x00000001c0000000-0x00000001d0000000 with 2.00 MiB pages [ 253.784612] radix-mmu: Mapped 0x00000001d0000000-0x00000001e0000000 with 2.00 MiB pages [ 253.786238] radix-mmu: Mapped 0x00000001e0000000-0x00000001f0000000 with 2.00 MiB pages Expected results: Hotunplug sccessfully. Additional info:
*** Bug 1901838 has been marked as a duplicate of this bug. ***
Did you add "movable_node" to kernel line in the guest ?
It wasn't reproduced on RHEL8.4 AV build qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.ppc64le
(In reply to Min Deng from comment #2) > Did you add "movable_node" to kernel line in the guest ? Yes.I did
According to the steps from the description, it wasn't reproducible from my side, talked with xuma, it wasn't reproducible always either. Thanks.
The actual reproducible scenario is that hot unplug the pc-dimm device when guest is booting. Steps to Reproduce: 1.Boot guest with command /usr/libexec/qemu-kvm \ -nodefaults \ -m 8192 \ -smp 32,maxcpus=32,cores=16,threads=1,sockets=2 \ -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \ -device spapr-vty,id=serial111,chardev=serial_id_serial0 \ -mon chardev=serial_id_serial0,mode=readline \ -device virtio-scsi-pci,bus=pci.0 \ -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ -drive file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \ -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ 2.Login guest and run hotplug pc-dimm device with hmp command (hmp)object_add memory-backend-ram,id=mem1,size=4G (hmp)device_add pc-dimm,id=dimm1,memdev=mem1,node=0 3.Reboot guest and hot unplog the pc-dimm device when guest is booting. (guest)reboot (hmp)device_del dimm1 4.Check memory infomation and dmesg log Will find failed to hot unplug it.if reboot guest again,the pc-dimm device will really be remove.
QE can't reproduce it with the original description on x86 and ppc, according to comment6 provided by Xujun, the similar one was reproducible on x86 with build qemu-kvm-5.2.0-0.scrmod+el8.3.0+8644+8675f3f8.rc0.x86_64 kernel-4.18.0-252.el8.x86_64 . According to the steps to above (qemu) object_add memory-backend-ram,id=mem1,size=4G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) info memory-devices Memory device [dimm]: "dimm1" addr: 0x240000000 slot: 0 node: 0 size: 4294967296 memdev: /objects/mem1 hotplugged: true hotpluggable: true (qemu) system_reset - reboot guest (qemu) device_del dimm1 - unplug (qemu) info memory-devices Memory device [dimm]: "dimm1" addr: 0x240000000 slot: 0 node: 0 size: 4294967296 memdev: /objects/mem1 hotplugged: true hotpluggable: true and system_reset again in HMP - reboot again and check memory-devices - check dimm again (qemu) info memory-devices - dimm is still there Memory device [dimm]: "dimm1" addr: 0x240000000 slot: 0 node: 0 size: 4294967296 memdev: /objects/mem1 hotplugged: true hotpluggable: true Actual results, The pc-dimm was still there. Expected result, I'm not very sure it should be removed or not at the moment since the hotplugged memory was always attached to the guest.
(In reply to Xujun Ma from comment #6) > The actual reproducible scenario is that hot unplug the pc-dimm device when > guest is booting. > > Steps to Reproduce: > 1.Boot guest with command > /usr/libexec/qemu-kvm \ > -nodefaults \ > -m 8192 \ Hmm... I hardly believe it is even possible for device_add to succeed if you don't specify 'slots' and 'maxmem' as well. (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 Error: no slots where allocated, please specify the 'slots' option > -smp 32,maxcpus=32,cores=16,threads=1,sockets=2 \ > -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \ > -device spapr-vty,id=serial111,chardev=serial_id_serial0 \ > -mon chardev=serial_id_serial0,mode=readline \ > -device virtio-scsi-pci,bus=pci.0 \ > -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ > -drive > file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2, > cache=none \ > -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ > -netdev > tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ > 2.Login guest and run hotplug pc-dimm device with hmp command > (hmp)object_add memory-backend-ram,id=mem1,size=4G > (hmp)device_add pc-dimm,id=dimm1,memdev=mem1,node=0 > 3.Reboot guest and hot unplog the pc-dimm device when guest is booting. > (guest)reboot > (hmp)device_del dimm1 PC-DIMM hot unplug support is negotiated with the guest during CAS. If device_del occurs between machine reset and CAS, the machine code currently rejects it on POWER and prints out an error. (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest Is this what you're seeing ? > 4.Check memory infomation and dmesg log > > Will find failed to hot unplug it.if reboot guest again,the pc-dimm device > will really be remove.
*** Bug 1903024 has been marked as a duplicate of this bug. ***
*** Bug 1903023 has been marked as a duplicate of this bug. ***
(In reply to Greg Kurz from comment #8) > (In reply to Xujun Ma from comment #6) > > The actual reproducible scenario is that hot unplug the pc-dimm device when > > guest is booting. > > > > Steps to Reproduce: > > 1.Boot guest with command > > /usr/libexec/qemu-kvm \ > > -nodefaults \ > > -m 8192 \ > > Hmm... I hardly believe it is even possible for device_add to > succeed if you don't specify 'slots' and 'maxmem' as well. yes,my input mistake.I actually used "-m 4G,slots=16,maxmem=20G \" > > (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 > Error: no slots where allocated, please specify the 'slots' option > > > -smp 32,maxcpus=32,cores=16,threads=1,sockets=2 \ > > -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \ > > -device spapr-vty,id=serial111,chardev=serial_id_serial0 \ > > -mon chardev=serial_id_serial0,mode=readline \ > > -device virtio-scsi-pci,bus=pci.0 \ > > -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ > > -drive > > file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2, > > cache=none \ > > -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ > > -netdev > > tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ > > 2.Login guest and run hotplug pc-dimm device with hmp command > > (hmp)object_add memory-backend-ram,id=mem1,size=4G > > (hmp)device_add pc-dimm,id=dimm1,memdev=mem1,node=0 > > 3.Reboot guest and hot unplog the pc-dimm device when guest is booting. > > (guest)reboot > > (hmp)device_del dimm1 > > PC-DIMM hot unplug support is negotiated with the guest during CAS. > If device_del occurs between machine reset and CAS, the machine > code currently rejects it on POWER and prints out an error. > > (qemu) device_del dimm1 > Error: Memory hot unplug not supported for this guest > > > Is this what you're seeing ? No,this problem didn't happend on that stage. > > > 4.Check memory infomation and dmesg log > > > > Will find failed to hot unplug it.if reboot guest again,the pc-dimm device > > will really be remove.
(In reply to Xujun Ma from comment #11) [...] > > > 3.Reboot guest and hot unplog the pc-dimm device when guest is booting. > > > (guest)reboot > > > (hmp)device_del dimm1 > > > > PC-DIMM hot unplug support is negotiated with the guest during CAS. > > If device_del occurs between machine reset and CAS, the machine > > code currently rejects it on POWER and prints out an error. > > > > (qemu) device_del dimm1 > > Error: Memory hot unplug not supported for this guest > > > > > > Is this what you're seeing ? > No,this problem didn't happend on that stage. What is the guest doing at the time you do device_del ? Still shutting down ? In SLOF or grub ? Already booting the kernel ? I suspect it isn't SLOF/grub otherwise you'd hit the above error. > > > > > 4.Check memory infomation and dmesg log > > > > > > Will find failed to hot unplug it.if reboot guest again,the pc-dimm device > > > will really be remove.
(In reply to Greg Kurz from comment #12) > (In reply to Xujun Ma from comment #11) > [...] > > > > 3.Reboot guest and hot unplog the pc-dimm device when guest is booting. > > > > (guest)reboot > > > > (hmp)device_del dimm1 > > > > > > PC-DIMM hot unplug support is negotiated with the guest during CAS. > > > If device_del occurs between machine reset and CAS, the machine > > > code currently rejects it on POWER and prints out an error. > > > > > > (qemu) device_del dimm1 > > > Error: Memory hot unplug not supported for this guest > > > > > > > > > Is this what you're seeing ? > > No,this problem didn't happend on that stage. > > What is the guest doing at the time you do device_del ? > > Still shutting down ? In SLOF or grub ? Already booting the kernel ? > > I suspect it isn't SLOF/grub otherwise you'd hit the above error. Already booting the kernel. > > > > > > > > 4.Check memory infomation and dmesg log > > > > > > > > Will find failed to hot unplug it.if reboot guest again,the pc-dimm device > > > > will really be remove.
Discussed it with Greg and set ITM as 10, feel free to update it if there's any concerns, thanks a lot.
Hi Greg, QE tried this bug on the following builds, qemu-kvm-4.2.0-39.module+el8.4.0+9248+2cae4f71 Please refer to the steps as the followings, /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine pseries,accel=kvm,cap-ccf-assist=off -m size=20G,slots=256,maxmem=100G -smp 4,sockets=4,cores=1,threads=1 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=rhel840-ppc64le-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor unix:/tmp/monitor3,server,nowait -object memory-backend-ram,id=mem1,size=20G QEMU 4.2.0 monitor - type 'help' for more information (qemu) device_add pc-dimm,id=dimm1,memdev=mem1,node=0 (qemu) system_reset (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest (qemu) device_del dimm1 Accroding to the results, we still can see the error message, does it work as designed ? Thanks.
Just one missing piece: https://lists.nongnu.org/archive/html/qemu-devel/2020-12/msg03566.html This patch should hopefully get merged upstream this week. I'll post backport right away.
Hi Greg, Since the fix is not ready in the target build,I will reassign it. And we probably need a new ITM for this bug. Greg, could you also give a preferal ITM ? Thanks a lot. Best regards, Min
Discussed it with Greg, and set ITM to new one. Thanks.
It's an improvement fix in my opinions, it's worthwhile for us have it on RHEL8.4AV (new bz1914069), thanks.
Discussed it with Greg, set it as ITM 16.
Hi Danilo Could you help to check this bug's status since there's fixed build already, maybe it should be in on_qa status ? Thanks Min
(In reply to Min Deng from comment #34) > Hi Danilo > Could you help to check this bug's status since there's fixed build already, > maybe it should be in on_qa status ? Thanks > Min (In reply to Min Deng from comment #34) > Hi Danilo > Could you help to check this bug's status since there's fixed build already, > maybe it should be in on_qa status ? Thanks > Min Hi Danilo, Please ignore me,I know the reason now, it was failure fixed build last time, we need to remove it from the bug and wait for the new build, thanks. Thanks
(In reply to Min Deng from comment #35) > (In reply to Min Deng from comment #34) > > Hi Danilo > > Could you help to check this bug's status since there's fixed build already, > > maybe it should be in on_qa status ? Thanks > > Min > > (In reply to Min Deng from comment #34) > > Hi Danilo > > Could you help to check this bug's status since there's fixed build already, > > maybe it should be in on_qa status ? Thanks > > Min > > Hi Danilo, > Please ignore me,I know the reason now, it was failure fixed build last > time, we need to remove it from the bug and wait for the new build, thanks. > > Thanks Hi Danilo, Could you please help to check when QE can get the new fix build, the current build mentioned in this bug isn't the new fix build, thanks. Best regards Min
Patch is not acked yet.
(In reply to Danilo Cesar Lemes de Paula from comment #38) > Patch is not acked yet. It is now.
Verified the bug on the following build qemu-kvm-4.2.0-44.module+el8.4.0+9776+c5744f20.ppc64le kernel-4.18.0-281.el8.ppc64le Steps, refer to comment 24 actual results, The pci-dimm can be removed successfully at early booting stage expected results, The pci-dimm can be removed successfully at early booting stage The original issue should be fixed, thanks a lot.
(In reply to Min Deng from comment #41) > Verified the bug on the following build > qemu-kvm-4.2.0-44.module+el8.4.0+9776+c5744f20.ppc64le > kernel-4.18.0-281.el8.ppc64le > Steps, > refer to comment 24 > actual results, > The pci-dimm can be removed successfully at early booting stage > expected results, > The pci-dimm can be removed successfully at early booting stage > The original issue should be fixed, thanks a lot. Hi Danilo, The verification and regression test passed on QE side so move this bug to be verified, could you please add this bug to errata later on, thanks a lot. Thanks Min
Hi Danilo, Could you help to check if the bz is in the errata now ? Thanks a lot Min
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1762