Description of problem: Detaching a big memory(1G) device from RHEL 8 guest can not trigger 'device-removal-failed' event, but it did with RHEL 7 guest. Version-Release number of selected component (if applicable): libvirt-4.5.0-15.module+el8+2529+a9686a4d.x86_64 qemu-kvm-2.12.0-48.module+el8+2529+a9686a4d.x86_64 kernel-4.18.0-48.el8.x86_64 guest kernel: 4.18.0-48.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure a guest with below setting: <maxMemory slots='16' unit='KiB'>15243264</maxMemory> <memory unit='KiB'>5242880</memory> <cpu> <numa> <cell id='0' cpus='0,2' memory='2097152' unit='KiB'/> <cell id='1' cpus='1,3' memory='2097152' unit='KiB'/> </numa> </cpu> 2. Prepare a memory device xml #cat mem.xml <memory model='dimm'> <target> <size unit='KiB'>1048576</size> <node>0</node> </target> </memory> 3. Start the guest and detach the memory device # virsh detach-device guest mem.xml Device detached successfully 4. Monitor the event # virsh event --all --loop <no device-removal-failed event triggered> 5. Change RHEL 8 guest to RHEL 7 guest (Kernel 3.10.0-957.el7.x86_64) and repeat steps 1-4. The event can be triggered. event 'device-removal-failed' for domain migrate-throttle: dimm0 Actual results: See above Expected results: The event should be triggered. Additional info: Comparing the messages between using RHEL 7 and RHEL 8 guest, "status": 1 is checked in libvirt codes, while "status": 130 is returned with RHEL 8 guest. With RHEL 7 guest: 224.667 ! 0x7f22e8026560 {"timestamp": {"seconds": 1545108097, "microseconds": 108042}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}} 224.711 ! 0x7f22e8026560 {"timestamp": {"seconds": 1545108097, "microseconds": 152480}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, ***"status": 1***, "slot": "0", "slot-type": "DIMM"}}} 4.364 > 0x7f22e00020b0 {"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-27"} With RHEL 8 guest: 4.365 < 0x7f22e00020b0 {"return": {}, "id": "libvirt-27"} 4.366 ! 0x7f22e00020b0 {"timestamp": {"seconds": 1545107876, "microseconds": 807362}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}} 4.456 ! 0x7f22e00020b0 {"timestamp": {"seconds": 1545107876, "microseconds": 897304}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, ***"status": 130***, "slot": "0", "slot-type": "DIMM"}}}
The reported status value 130 (0x82) is documented as specific to the notification type so we'll need some clarification on what other codes than '1' mean failure. See qemuProcessHandleAcpiOstInfo for how this is handled.
Igor, can you please have a look at comment 0 and check if this is not a regression lower in the stack? Thanks.
Difference is that RHEL7 emits status 1 on failure (ACPI6.2A Table 6-205: Non-specific failure) while RHEL8 emits 130 (0x82) (ACPI6.2A Table 6-207: Device Busy). i.e. RHEL8 is correct as well it just emits a specific failure reason. QEMU/Guest parts in stack work as expected (agreement between QEMU and libvirt was that we do not invent a new protocol and reuse ACPI spec values here). If libvirt makes up event device-removal-failed based on status 1 it also should do the same for status codes 0x80-0x83, note status 0x84 is not yet failure (perhaps for it the waiting period should be renewed)
Fixed upstream: commit ab5d49d46c4f11baae232e04cacd514de1d86fe9 (HEAD -> master, origin/master, origin/HEAD) Author: Peter Krempa <pkrempa> Date: Tue Jan 22 12:14:39 2019 +0100 qemu: process: Handle all failure values for dimms in qemuProcessHandleAcpiOstInfo Hanlde all the possible failure codes as per ACPI standard documented in the function header. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1660410 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit f2f14e3f45d24121f82d4c17de6d9ed770cd164a Author: Peter Krempa <pkrempa> Date: Tue Jan 22 10:35:53 2019 +0100 qemu: process: Improve documentation of values handled by qemuProcessHandleAcpiOstInfo We forgot to document the specific fields for the 0x103 and 0x200 sources which are tied to device removal and device hotplug respectively. The value description is based on the ACPI 6.2A standard Table 6-207 and Table 6-208. At the time of writing of this patch the standard can be accessed e.g. at: https://www.uefi.org/sites/default/files/resources/ACPI%206_2_A_Sept29.pdf Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko>
Present in v5.0.0-88-gab5d49d46c upstream
Verify this bug with: libvirt-daemon-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64 libvirt-debugsource-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64 qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3.x86_64 python3-libvirt-5.0.0-4.module+el8.0.1+3317+6e0079ab.x86_64 Steps: 1.detach dimm for rhel8 guest(kernel-4.18.0-80.el8.x86_64): error: Failed to detach device from mem.xml error: operation failed: unplug of device was rejected by the guest # virsh event --loop --all event 'device-removal-failed' for domain rhel8.0-local: dimm0 #python /usr/share/doc/python3-libvirt/examples/event-test.py myDomainEventDeviceRemovalFailedCallback: Domain rhel8.0-local(35) failed to remove device: dimm0 # cat /var/log/libvirt/libvirtd.log|grep " info :"|grep dimm 2019-07-04 07:00:19.048+0000: 14489: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f4a8001b400 msg={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"} 2019-07-04 07:00:19.050+0000: 14487: info : qemuMonitorIOWrite:549 : QEMU_MONITOR_IO_WRITE: mon=0x7f4a8001b400 buf={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"} 2019-07-04 07:00:19.055+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a8001b400 event={"timestamp": {"seconds": 1562223619, "microseconds": 55224}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}} 2019-07-04 07:00:20.590+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a8001b400 event={"timestamp": {"seconds": 1562223620, "microseconds": 589588}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 130, "slot": "0", "slot-type": "DIMM"}}} 2.for rhel7 guest(kernel-3.10.0-1060.el7.x86_64): event 'device-removal-failed' for domain avocado-vt-vm1: dimm0 myDomainEventDeviceRemovalFailedCallback: Domain avocado-vt-vm1(39) failed to remove device: dimm0 myDomainEventDeviceRemovalFailedCallback: Domain avocado-vt-vm1(36) failed to remove device: dimm0 # tail -f /var/log/libvirt/libvirtd.log|grep " info :"|grep dimm 2019-07-04 07:29:31.299+0000: 14491: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f4a803296f0 msg={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"} 2019-07-04 07:29:31.300+0000: 14487: info : qemuMonitorIOWrite:549 : QEMU_MONITOR_IO_WRITE: mon=0x7f4a803296f0 buf={"execute":"device_del","arguments":{"id":"dimm0"},"id":"libvirt-17"} 2019-07-04 07:29:31.307+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a803296f0 event={"timestamp": {"seconds": 1562225371, "microseconds": 306814}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 132, "slot": "0", "slot-type": "DIMM"}}} 2019-07-04 07:29:31.311+0000: 14487: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f4a803296f0 event={"timestamp": {"seconds": 1562225371, "microseconds": 310124}, "event": "ACPI_DEVICE_OST", "data": {"info": {"device": "dimm0", "source": 3, "status": 1, "slot": "0", "slot-type": "DIMM"}}} 3. # vim /usr/src/debug/libvirt-5.0.0-11.module+el8.0.1+3459+e357ef2f.x86_64/src/qemu/qemu_process.c ... 1332 * Meaning of fields reported by the event according to the ACPI standard: 1333 * @source: 1334 * 0x00 - 0xff: Notification values, as passed at the request time 1335 * 0x100: Operating System Shutdown Processing 1336 * 0x103: Ejection processing 1337 * 0x200: Insertion processing 1338 * other values are reserved 1339 * 1340 * @status: 1341 * general values 1342 * 0x00: success 1343 * 0x01: non-specific failure 1344 * 0x02: unrecognized notify code 1345 * 0x03 - 0x7f: reserved 1346 * other values are specific to the notification type (see below) 1347 * 1348 * for the 0x100 source the following additional codes are standardized: 1349 * 0x80: OS Shutdown request denied 1350 * 0x81: OS Shutdown in progress 1351 * 0x82: OS Shutdown completed 1352 * 0x83: OS Graceful shutdown not supported 1353 * other higher values are reserved 1354 * 1355 * for the 0x003 (Ejection request) and 0x103 (Ejection processing) source 1356 * the following additional codes are standardized: 1357 * 0x80: Device ejection not supported by OSPM 1358 * 0x81: Device in use by application 1359 * 0x82: Device Busy 1360 * 0x83: Ejection dependency is busy or not supported for ejection by OSPM 1361 * 0x84: Ejection is in progress (pending) 1362 * other higher values are reserved 1363 * 1364 * for the 0x200 source the following additional codes are standardized: 1365 * 0x80: Device insertion in progress (pending) 1366 * 0x81: Device driver load failure 1367 * 0x82: Device insertion not supported by OSPM 1368 * 0x83-0x8F: Reserved 1369 * 0x90-0x9F: Insertion failure - Resources Unavailable as described by the 1370 * following bit encodings: 1371 * Bit [3]: Bus or Segment Numbers 1372 * Bit [2]: Interrupts 1373 * Bit [1]: I/O 1374 * Bit [0]: Memory 1375 * other higher values are reserved ...
For rhel6 guest, 'device-removal-failed' event can not be triggered due to https://bugzilla.redhat.com/show_bug.cgi?id=1727069.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2395