1589647 – Un-hotplugging memory wasn't successfully but after rebooting guest,it was removed.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1589647 - Un-hotplugging memory wasn't successfully but after rebooting guest,it was removed.

Summary: Un-hotplugging memory wasn't successfully but after rebooting guest,it was re...

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.6
Hardware:	ppc64le
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Serhii Popovych
QA Contact:	Min Deng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-11 05:16 UTC by Min Deng
Modified:	2018-06-15 09:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-12 04:02:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Min Deng 2018-06-11 05:16:42 UTC

Description of problem:
hot-unplugging memory wasn't successfully but after rebooting guest,it was removed.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-3.el7.ppc64le
kernel-3.10.0-862.el7.ppc64le - host and guest
kernel-3.10.0-901.el7.ppc64le - host and guest


How reproducible:
always

Steps to Reproduce:
1.boot up a guest 
  /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine pseries,accel=kvm,usb=off,dump-guest-core=off -m size=8388608k,slots=256,maxmem=419430400k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -numa node,nodeid=1,cpus=2-3,mem=4096 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=rhel75-ppc64le-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:8a:8b,bus=pci.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor unix:/tmp/monitor3,server,nowait

2.info numa
  (qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 0 plugged: 0 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

3.
(qemu) object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

4.(qemu) info numa  
2 nodes
node 0 cpus: 0 1
node 0 size: 14336 MB
node 0 plugged: 10240 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

5.
(qemu) device_del dimm1  
(qemu) device_del dimm1   
Memory unplug already in progress for device dimm1 
Notes,
Here's something wrong as well,it seemed unplugging is not completely successful.Check numa info

6.(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 14336 MB  
node 0 plugged: 10240 MB  --memory still here so far
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB

7.reboot guest            -- in my opinions,step 6 and step 7 should have the same output.But the memory was removed at last.
(qemu) system_reset
(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 0 plugged: 0 MB
node 1 cpus: 2 3
node 1 size: 4096 MB
node 1 plugged: 0 MB


Actual results:
Step6's memory was still there so QE think it wasn't successful hot-unplug behavior.But after rebooting,the memory was removed.It is unreasonable.

Expected results:
In my opinions,step 6 and step 7 should have the same output.

Additional info:

Comment 2 Serhii Popovych 2018-06-11 05:54:28 UTC

This comes from bz1528178#c8. It seems hot plugged memory is used in guest and their contents can't be moved, so kernel gives following:

  series-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs

They look as bz1432302, except for error message, that is duplicate of bz1245892.

Comment 3 David Gibson 2018-06-12 04:02:20 UTC

This is pretty much expected behaviour, and it isn't easy to change.

The PAPR hotplug protocol doesn't have a way for the guest to definitively report an unplug failure to the host, so the host can't tell the difference between a true failure and a hot unplug which is just taking a long time to complete.

We could implement a timeout on the host side, but it's not clear we have a good way of cancelling a hotplug process that's already been signalled to the guest.

So, this is basically treated as a hot-unplug which is taking a very long time to complete.  When the guest is reset, we no longer need to consider the guest's state, and so we're able to immediately complete the request.

Note You need to log in before you can comment on or make changes to this bug.