Bug 1589647

Summary: Un-hotplugging memory wasn't successfully but after rebooting guest,it was removed.
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: Serhii Popovych <spopovyc>
Status: CLOSED CANTFIX QA Contact: Min Deng <mdeng>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: dgibson, dzheng, mdeng, michen, qzhang, spopovyc, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-12 04:02:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2018-06-11 05:16:42 UTC
Description of problem:
hot-unplugging memory wasn't successfully but after rebooting guest,it was removed.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-3.el7.ppc64le
kernel-3.10.0-862.el7.ppc64le - host and guest
kernel-3.10.0-901.el7.ppc64le - host and guest


How reproducible:
always

Steps to Reproduce:
1.boot up a guest 
  /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine pseries,accel=kvm,usb=off,dump-guest-core=off -m size=8388608k,slots=256,maxmem=419430400k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -numa node,nodeid=1,cpus=2-3,mem=4096 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=rhel75-ppc64le-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:8a:8b,bus=pci.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor unix:/tmp/monitor3,server,nowait

2.info numa
  (qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 0 plugged: 0 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

3.
(qemu) object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

4.(qemu) info numa  
2 nodes
node 0 cpus: 0 1
node 0 size: 14336 MB
node 0 plugged: 10240 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

5.
(qemu) device_del dimm1  
(qemu) device_del dimm1   
Memory unplug already in progress for device dimm1 
Notes,
Here's something wrong as well,it seemed unplugging is not completely successful.Check numa info

6.(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 14336 MB  
node 0 plugged: 10240 MB  --memory still here so far
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

7.reboot guest            -- in my opinions,step 6 and step 7 should have the same output.But the memory was removed at last.
(qemu) system_reset
(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 0 plugged: 0 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB


Actual results:
Step6's memory was still there so QE think it wasn't successful hot-unplug behavior.But after rebooting,the memory was removed.It is unreasonable.

Expected results:
In my opinions,step 6 and step 7 should have the same output.

Additional info:

Comment 2 Serhii Popovych 2018-06-11 05:54:28 UTC
This comes from bz1528178#c8. It seems hot plugged memory is used in guest and their contents can't be moved, so kernel gives following:

  series-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs

They look as bz1432302, except for error message, that is duplicate of bz1245892.

Comment 3 David Gibson 2018-06-12 04:02:20 UTC
This is pretty much expected behaviour, and it isn't easy to change.

The PAPR hotplug protocol doesn't have a way for the guest to definitively report an unplug failure to the host, so the host can't tell the difference between a true failure and a hot unplug which is just taking a long time to complete.

We could implement a timeout on the host side, but it's not clear we have a good way of cancelling a hotplug process that's already been signalled to the guest.

So, this is basically treated as a hot-unplug which is taking a very long time to complete.  When the guest is reset, we no longer need to consider the guest's state, and so we're able to immediately complete the request.