RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1320447 - [RFE] Report memory hotunplug failure
Summary: [RFE] Report memory hotunplug failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1228543
TreeView+ depends on / blocked
 
Reported: 2016-03-23 09:10 UTC by Milan Zamazal
Modified: 2016-11-03 18:40 UTC (History)
13 users (show)

Fixed In Version: libvirt-1.3.4-1.el7
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 18:40:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Description Milan Zamazal 2016-03-23 09:10:26 UTC
When trying to detach a hotplugged memory device (and possibly other devices) via `virsh detach-device' (or virDomainDetachDeviceFlags), libvirt doesn't report failure of the action.  It always returns with success, perhaps after some timeout, and emits a corresponding event only in case of device removal success.

While this is a documented feature, it has some important drawbacks:

- If device detach fails quickly, libvirt unnecessarily waits for 5 seconds before it timeouts and returns from virDomainDetachDeviceFlags call.
- After virDomainDetachDeviceFlags call finishes, the caller can check for success (either by watching VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event or by checking the domain XML) but can't distinguish between failure and pending request.  If the device removal event is not received and the device is still present in domain XML then it may mean both that the operation is still in progress or that it has already failed.  So the caller is uncertain about the result.

QEMU already emits an event on memory hotunplug failure, for example:

  event ACPI_DEVICE_OST at 1458308024.447310 for domain centos: {"info":{"device":"dimm0","source":3,"status":132,"slot":"0","slot-type":"DIMM"}} ... action start
  event ACPI_DEVICE_OST at 1458308024.461017 for domain centos: {"info":{"device":"dimm0","source":3,"status":1,"slot":"0","slot-type":"DIMM"}} ... action failure

A suggested improvement is to watch for the failure event from QEMU and react accordingly in libvirt, that means:

- Returning from virDomainDetachDeviceFlags immediately not only after receiving a QEMU success event but also after receiving a QEMU failure event.
- Returning failure (instead of success) in case of device removal failure before the call timeouts.
- Emitting a newly introduced libvirt event on device removal failure.

That would solve both the drawbacks described above and the result of the operation would be clear (immediately, in common cases) after returning from virDomainDetachDeviceFlags:

- If failure is returned then hotunplug failed.
- If success is returned and the device is no longer present in the domain XML or the device removal success event has been received then hotunplug was successful.
- Otherwise the operation is still pending and the caller should watch for the corresponding events to be informed about the final result.

Comment 4 Peter Krempa 2016-04-13 11:40:06 UTC
The functionality was added upstream by:

commit 0ad64e20d8f8ce49645b3147ab3bcbf2ae5de41a
Author: Peter Krempa <pkrempa>
Date:   Fri Apr 1 17:48:20 2016 +0200

    qemu: process: Wire up ACPI OST events to notify users of failed memory unplug
    
    Since qemu is now able to notify us that the guest rejected the memory
    unplug operation we can relay this to the user and make the API fail
    right away.
    
    Additionally document the possible values from the ACPI docs for future
    reference.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1320447

commit 650e8d2c590260dabc1f7426565792da4ccb74ab
Author: Peter Krempa <pkrempa>
Date:   Fri Apr 1 16:41:08 2016 +0200

    qemu: monitor: Add support for ACPI_DEVICE_OST event handling
    
    The event is emitted on ACPI OSPM Status Indication events.
    
    ACPI standard documentation describes the method as:
    
    This object is an optional control method that is invoked by OSPM to
    indicate processing status to the platform. During device ejection,
    device hot add, or other event processing, OSPM may need to perform
    specific handshaking with the platform. OSPM may also need to indicate
    to the platform its inability to complete a requested operation; for
    example, when a user presses an ejection button for a device that is
    currently in use or is otherwise currently incapable of being ejected.
    In this case, the processing of the ACPI Eject Request notification by
    OSPM fails. OSPM may indicate this failure to the platform through the
    invocation of the _OST control method. As a result of the status
    notification indicating ejection failure, the platform may take certain
    action including reissuing the notification or perhaps turning on an
    appropriate indicator light to signal the failure to the user.

commit 5be120710e7865b1ee198d398176b75253fb0b3f
Author: Peter Krempa <pkrempa>
Date:   Wed Mar 30 18:09:45 2016 +0200

    Add VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED event
    
    Since we didn't opt to use one single event for device lifecycle for a
    VM we are missing one last event if the device removal failed. This
    event will be emitted once we asked to eject the device but for some
    reason it is not possible.

Comment 6 Jingjing Shao 2016-05-25 01:38:20 UTC
Hi Peter,

I can not find the error info when hot unplug the memory device from the guest without OS and still find the device info in the xml of guest

[root@ibm-x3850x5-05 jishao]# rpm -q libvirt
libvirt-1.3.4-1.el7.x86_64

(1)start a guest without OS
[root@ibm-x3850x5-05 images]# virsh start r7.2
Domain r7.2 started

(2)[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7.2 | grep dim -A4
[root@ibm-x3850x5-05 jishao]# 
[root@ibm-x3850x5-05 jishao]# 

(3)[root@ibm-x3850x5-05 jishao]# cat memdevice.xml 
<memory model='dimm'>
<target>
<size unit='MiB'>500</size>
<node>0</node>
</target>
</memory>

(4)[root@ibm-x3850x5-05 jishao]# virsh attach-device r7.2  memdevice.xml 
Device attached successfully

(5)[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7.2 | grep dim -A4
    <memory model='dimm'>
      <target>
        <size unit='KiB'>512000</size>
        <node>0</node>
      </target>
      <alias name='dimm0'/>
      <address type='dimm' slot='0' base='0x100000000'/>
    </memory>

(6)[root@ibm-x3850x5-05 jishao]# virsh detach-device r7.2  memdevice.xml 
Device detached successfully

(7)[root@ibm-x3850x5-05 jishao]# echo $?
    0

(8)[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7.2 | grep dim -A4
    <memory model='dimm'>
      <target>
        <size unit='KiB'>512000</size>
        <node>0</node>
      </target>
      <alias name='dimm0'/>
      <address type='dimm' slot='0' base='0x100000000'/>
    </memory>

Comment 7 Peter Krempa 2016-05-25 05:54:55 UTC
(In reply to Jingjing Shao from comment #6)
> Hi Peter,
> 
> I can not find the error info when hot unplug the memory device from the
> guest without OS and still find the device info in the xml of guest

The notification that hot-unplug failed is delivered only when there is an OS that rejects the memory unplug request. Without OS it will behave like it did until now by not delivering any event and the device stays in the XML.

Comment 8 Luyao Huang 2016-06-15 07:39:13 UTC
Verify this bug with libvirt-1.3.5-1.el7.x86_64:

0. use stap to watch qemu monitor and virsh event in 2 window

# stap qemu-monitor.stp 
  0.000 begin

# virsh event rhel7.0-rhel --all --loop



1. prepare a guest with os and enable memory hotplug:

# virsh dumpxml rhel7.0-rhel
<domain type='kvm' id='5'>
  <name>rhel7.0-rhel</name>
  <uuid>67c7a123-5415-4136-af62-a2ee098ba6cd</uuid>
  <maxMemory slots='16' unit='KiB'>15243264</maxMemory>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
...
  <cpu>
    <numa>
      <cell id='0' cpus='0,2' memory='2097152' unit='KiB'/>
      <cell id='1' cpus='1,3' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>

2. hotplug a 1G memory device:

# cat mem2.xml 
    <memory model='dimm'>
      <target>
        <size unit='G'>1</size>
        <node>0</node>
      </target>
    </memory>

# virsh attach-device rhel7.0-rhel mem2.xml
Device attached successfully

3. we can see the event in stap window

329.486 > 0x7ffa84231d30 {"execute":"object-add","arguments":{"qom-type":"memory-backend-ram","id":"memdimm0","props":{"size":1073741824}},"id":"libvirt-50"}
329.493 < 0x7ffa84231d30 {"return": {}, "id": "libvirt-50"}
329.493 > 0x7ffa84231d30 {"execute":"device_add","arguments":{"driver":"pc-dimm","node":"0","memdev":"memdimm0","id":"dimm0"},"id":"libvirt-51"}
329.518 < 0x7ffa84231d30 {"return": {}, "id": "libvirt-51"}

4, hot-unplug 1g memory:

# virsh detach-device rhel7.0-rhel mem2.xml
error: Failed to detach device from mem2.xml
error: operation failed: unplug of device was rejected by the guest

5. check the stap window:

426.483 > 0x7ffa84231d30 {"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-55"}
426.488 < 0x7ffa84231d30 {"return": {}, "id": "libvirt-55"}
426.493 ! 0x7ffa84231d30 {"timestamp": {"seconds": 1465973894, "microseconds": 201574}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}}
426.669 ! 0x7ffa84231d30 {"timestamp": {"seconds": 1465973894, "microseconds": 377542}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 1, "slot": "0", "slot-type": "DIMM"}}}

6. check virsh event window

# virsh event rhel7.0-rhel --all --loop
...
event 'device-removal-failed' for domain rhel7.0-rhel: dimm0
...

7. login guest and check the dmesg:

# dmesg
...
[  751.307067] ACPI: \_SB_.MP00: ACPI_NOTIFY_EJECT_REQUEST event
[  751.329258] Offlined Pages 32768
[  751.338795] Offlined Pages 32768
[  751.352214] Offlined Pages 32768
[  751.359408] Offlined Pages 32768
[  751.372588] Offlined Pages 32768
[  751.376824] memory memory41: Offline failed.

8. retest with a guest without os

# virsh start rhel7.0-rhel-noos
Domain rhel7.0-rhel-noos started

# virsh attach-device rhel7.0-rhel-noos mem2.xml
Device attached successfully

# virsh detach-device rhel7.0-rhel-noos mem2.xml
Device detached successfully

and there is no ACPI_DEVICE_OST event on stap window

9. test libvirt-python with libvirt-python-1.3.5-1.el7.x86_64:

# python
Python 2.7.5 (default, Oct 11 2015, 17:47:16) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.open()
>>> lista = dir(libvirt)
>>> "VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED" in lista
True

10. check virsh event --list output:

# virsh event --list
lifecycle
reboot
rtc-change
watchdog
io-error
graphics
io-error-reason
control-error
block-job
disk-change
tray-change
pm-wakeup
pm-suspend
balloon-change
pm-suspend-disk
device-removed
block-job-2
tunable
agent-lifecycle
device-added
migration-iteration
job-completed
device-removal-failed

11. test unplug success to make sure it won't break old feature:

# cat mem1.xml
    <memory model='dimm'>
      <target>
        <size unit='KiB'>131072</size>
        <node>0</node>
      </target>
    </memory>

# virsh attach-device rhel7.0-rhel mem1.xml
Device attached successfully

# virsh detach-device rhel7.0-rhel mem1.xml
Device detached successfully

there is no device-removal-failed event in event window

Comment 10 errata-xmlrpc 2016-11-03 18:40:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.