Bug 1660410 - 'device-removal-failed' event can not be triggerred
Summary: 'device-removal-failed' event can not be triggerred
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Peter Krempa
QA Contact: yanqzhan@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-18 09:54 UTC by Dan Zheng
Modified: 2020-11-14 05:45 UTC (History)
13 users (show)

Fixed In Version: libvirt-5.0.0-10.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-07 10:41:09 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2395 0 None None None 2019-08-07 10:41:35 UTC

Description Dan Zheng 2018-12-18 09:54:55 UTC
Description of problem:
Detaching a big memory(1G) device from RHEL 8 guest can not trigger 'device-removal-failed' event, but it did with RHEL 7 guest.

Version-Release number of selected component (if applicable):

libvirt-4.5.0-15.module+el8+2529+a9686a4d.x86_64
qemu-kvm-2.12.0-48.module+el8+2529+a9686a4d.x86_64
kernel-4.18.0-48.el8.x86_64
guest kernel: 4.18.0-48.el8.x86_64 



How reproducible:
100%

Steps to Reproduce:
1. Configure a guest with below setting:
  <maxMemory slots='16' unit='KiB'>15243264</maxMemory>
  <memory unit='KiB'>5242880</memory>

  <cpu>
    <numa>
      <cell id='0' cpus='0,2' memory='2097152' unit='KiB'/>
      <cell id='1' cpus='1,3' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>


2. Prepare a memory device xml
#cat mem.xml

    <memory model='dimm'>
      <target>
        <size unit='KiB'>1048576</size>
        <node>0</node>
      </target>
    </memory>

3. Start the guest and detach the memory device
# virsh detach-device guest mem.xml
Device detached successfully

4. Monitor the event
# virsh event --all --loop
<no device-removal-failed event triggered>

5. Change RHEL 8 guest to RHEL 7 guest (Kernel 3.10.0-957.el7.x86_64) and repeat steps 1-4. The event can be triggered.
event 'device-removal-failed' for domain migrate-throttle: dimm0



Actual results:
See above

Expected results:
The event should be triggered. 

Additional info:

Comparing the messages between using RHEL 7 and RHEL 8 guest, "status": 1 is checked in libvirt codes, while "status": 130 is returned with RHEL 8 guest.


With RHEL 7 guest:

 224.667 ! 0x7f22e8026560 {"timestamp": {"seconds": 1545108097, "microseconds": 108042}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}}
224.711 ! 0x7f22e8026560 {"timestamp": {"seconds": 1545108097, "microseconds": 152480}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, ***"status": 1***, "slot": "0", "slot-type": "DIMM"}}}
4.364 > 0x7f22e00020b0 {"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-27"}


With RHEL 8 guest:

4.365 < 0x7f22e00020b0 {"return": {}, "id": "libvirt-27"}
4.366 ! 0x7f22e00020b0 {"timestamp": {"seconds": 1545107876, "microseconds": 807362}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}}
4.456 ! 0x7f22e00020b0 {"timestamp": {"seconds": 1545107876, "microseconds": 897304}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, ***"status": 130***, "slot": "0", "slot-type": "DIMM"}}}

Comment 1 Peter Krempa 2018-12-18 10:12:04 UTC
The reported status value 130 (0x82) is documented as specific to the notification type so we'll need some clarification on what other codes than '1' mean failure.

See qemuProcessHandleAcpiOstInfo for how this is handled.

Comment 2 Jaroslav Suchanek 2019-01-08 16:01:55 UTC
Igor, can you please have a look at comment 0 and check if this is not a regression lower in the stack? Thanks.

Comment 4 Igor Mammedov 2019-01-17 13:04:39 UTC
Difference is that RHEL7 emits status 1 on failure (ACPI6.2A Table 6-205: Non-specific failure) while RHEL8 emits 130 (0x82) (ACPI6.2A Table 6-207: Device Busy). i.e. RHEL8 is correct as well it just emits a specific failure reason.
QEMU/Guest parts in stack work as expected (agreement between QEMU and libvirt was that we do not invent a new protocol and reuse ACPI spec values here).

If libvirt makes up event device-removal-failed based on status 1 it also should do the same for status codes 0x80-0x83, note status 0x84 is not yet failure (perhaps for it the waiting period should be renewed)

Comment 5 Peter Krempa 2019-01-23 09:45:49 UTC
Fixed upstream:

commit ab5d49d46c4f11baae232e04cacd514de1d86fe9 (HEAD -> master, origin/master, origin/HEAD)
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Tue Jan 22 12:14:39 2019 +0100

    qemu: process: Handle all failure values for dimms in qemuProcessHandleAcpiOstInfo
    
    Hanlde all the possible failure codes as per ACPI standard documented in
    the function header.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1660410
    
    Signed-off-by: Peter Krempa <pkrempa@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit f2f14e3f45d24121f82d4c17de6d9ed770cd164a
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Tue Jan 22 10:35:53 2019 +0100

    qemu: process: Improve documentation of values handled by qemuProcessHandleAcpiOstInfo
    
    We forgot to document the specific fields for the 0x103 and 0x200
    sources which are tied to device removal and device hotplug
    respectively.
    
    The value description is based on the ACPI 6.2A standard Table 6-207 and
    Table 6-208. At the time of writing of this patch the standard can be
    accessed e.g. at:
    
    https://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf
    
    Signed-off-by: Peter Krempa <pkrempa@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

Comment 7 Peter Krempa 2019-01-23 09:54:25 UTC
Present in v5.0.0-88-gab5d49d46c upstream

Comment 11 yanqzhan@redhat.com 2019-07-04 08:11:54 UTC
Verify this bug with:
libvirt-daemon-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64
libvirt-debugsource-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64
qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64
python3-libvirt-5.0.0-4.module+el8.0.1+3317+6e0079ab.x86_64

Steps:
1.detach dimm for rhel8 guest(kernel-4.18.0-80.el8.x86_64):

error: Failed to detach device from mem.xml
error: operation failed: unplug of device was rejected by the guest

# virsh event --loop --all
event 'device-removal-failed' for domain rhel8.0-local: dimm0

#python /usr/share/doc/python3-libvirt/examples/event-test.py
myDomainEventDeviceRemovalFailedCallback: Domain rhel8.0-local(35) failed to remove device: dimm0

# cat /var/log/libvirt/libvirtd.log|grep " info :"|grep dimm
2019-07-04 07:00:19.048+0000: 14489: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f4a8001b400 msg={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"}
2019-07-04 07:00:19.050+0000: 14487: info : qemuMonitorIOWrite:549 : QEMU_MONITOR_IO_WRITE: mon=0x7f4a8001b400 buf={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"}
2019-07-04 07:00:19.055+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a8001b400 event={"timestamp": {"seconds": 1562223619, "microseconds": 55224}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}}
2019-07-04 07:00:20.590+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a8001b400 event={"timestamp": {"seconds": 1562223620, "microseconds": 589588}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 130, "slot": "0", "slot-type": "DIMM"}}}

2.for rhel7 guest(kernel-3.10.0-1060.el7.x86_64):

event 'device-removal-failed' for domain avocado-vt-vm1: dimm0

myDomainEventDeviceRemovalFailedCallback: Domain avocado-vt-vm1(39) failed to remove device: dimm0

myDomainEventDeviceRemovalFailedCallback: Domain avocado-vt-vm1(36) failed to remove device: dimm0

# tail -f /var/log/libvirt/libvirtd.log|grep " info :"|grep dimm
2019-07-04 07:29:31.299+0000: 14491: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f4a803296f0 msg={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"}
2019-07-04 07:29:31.300+0000: 14487: info : qemuMonitorIOWrite:549 : QEMU_MONITOR_IO_WRITE: mon=0x7f4a803296f0 buf={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"}
2019-07-04 07:29:31.307+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a803296f0 event={"timestamp": {"seconds": 1562225371, "microseconds": 306814}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}}
2019-07-04 07:29:31.311+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a803296f0 event={"timestamp": {"seconds": 1562225371, "microseconds": 310124}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 1, "slot": "0", "slot-type": "DIMM"}}}

3. # vim /usr/src/debug/libvirt-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64/src/qemu/qemu_process.c
...
1332  * Meaning of fields reported by the event according to the ACPI standard:
1333  * @source:
1334  *  0x00 - 0xff: Notification values, as passed at the request time
1335  *  0x100: Operating System Shutdown Processing
1336  *  0x103: Ejection processing
1337  *  0x200: Insertion processing
1338  *  other values are reserved
1339  *
1340  * @status:
1341  *   general values
1342  *     0x00: success
1343  *     0x01: non-specific failure
1344  *     0x02: unrecognized notify code
1345  *     0x03 - 0x7f: reserved
1346  *     other values are specific to the notification type (see below)
1347  *
1348  *   for the 0x100 source the following additional codes are standardized:
1349  *     0x80: OS Shutdown request denied
1350  *     0x81: OS Shutdown in progress
1351  *     0x82: OS Shutdown completed
1352  *     0x83: OS Graceful shutdown not supported
1353  *     other higher values are reserved
1354  *
1355  *  for the 0x003 (Ejection request) and 0x103 (Ejection processing) source
1356  *  the following additional codes are standardized:
1357  *     0x80: Device ejection not supported by OSPM
1358  *     0x81: Device in use by application
1359  *     0x82: Device Busy
1360  *     0x83: Ejection dependency is busy or not supported for ejection by OSPM
1361  *     0x84: Ejection is in progress (pending)
1362  *     other higher values are reserved
1363  *
1364  *  for the 0x200 source the following additional codes are standardized:
1365  *     0x80: Device insertion in progress (pending)
1366  *     0x81: Device driver load failure
1367  *     0x82: Device insertion not supported by OSPM
1368  *     0x83-0x8F: Reserved
1369  *     0x90-0x9F: Insertion failure - Resources Unavailable as described by the
1370  *                                    following bit encodings:
1371  *                                    Bit [3]: Bus or Segment Numbers
1372  *                                    Bit [2]: Interrupts
1373  *                                    Bit [1]: I/O
1374  *                                    Bit [0]: Memory
1375  *     other higher values are reserved
...

Comment 12 yanqzhan@redhat.com 2019-07-04 11:31:40 UTC
For rhel6 guest, 'device-removal-failed' event can not be triggered due to https://bugzilla.redhat.com/show_bug.cgi?id=1727069.

Comment 14 errata-xmlrpc 2019-08-07 10:41:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2395


Note You need to log in before you can comment on or make changes to this bug.