Bug 1119784

Summary: QMP: extend block events with error information
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: armbru, dyuan, eblake, fromani, huding, jdenemar, juzhang, kwolf, michen, mzhan, pbonzini, qzhang, rbalakri, shu, shyu, sluo, virt-bugs, virt-maint, xfu, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.8-6.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1117445 Environment:
Last Closed: 2015-03-05 07:41:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117445    
Bug Blocks:    

Description Luiz Capitulino 2014-07-15 13:47:56 UTC
This BZ is about the libvirt side work that will be required when the RHEL-only BLOCK_IO_ERROR event extensions are added upstream as described below.

+++ This bug was initially created as a clone of Bug #1117445 +++

In RHEL6, we have extended the BLOCK_IO_ERROR event to contain the following fields:

o __com.redhat_reason: string representing enum value (ie. "eio", "eperm", "enospc" or "eother")
o __com.redhat_debug_info.errno: errno value as an integer
o __com.redhat_debug_info.message: error message returned by strerror()

Since then we have carried those extensions forward:

o RHEL6 (original request): bug 586349 and bug 624607
o RHEL7.0: bug 971938 and bug 895041
o RHEL7.1: bug 1116772

It's time to add this feature upstream, possibly for BLOCK_JOB_ERROR too. I'll add design ideas in the comments.

--- Additional comment from RHEL Product and Program Management on 2014-07-08 13:37:54 EDT ---

Since this bug report was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Luiz Capitulino on 2014-07-08 13:52:53 EDT ---

Here are some design considerations when doing this for upstream:

o We may want the extension(s) in BLOCK_IO_ERROR and BLOCK_JOB_ERROR events
o query-block and query-block-jobs must contain this info too
o The errno integer should be dropped
o Having the error string from strerror() is probably fine
o For the "reason" field, we have two options: a QAPI enum containing the most common errnos; or, instead of having a "reason" at all, we could only distinguish between ENOSPC and all the other errors. Say, having "no-space-error" bool
o If doing the QAPI enum containing the most comman errnos, they we should have a catch-all for unknown errnos (eg. "unknown-error-code")

Here goes an example (taking the QAPI enum as solution for the "reason" field):

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop",
              "error-reason": "eio",
              "error-message": "I/O error" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Comment 1 Eric Blake 2014-10-03 15:03:28 UTC
Upstream patch proposed.
https://www.redhat.com/archives/libvir-list/2014-October/msg00124.html

Comment 4 Xuesong Zhang 2014-11-21 08:46:06 UTC
Test with the following build, this bug is verified.

libvirt-1.2.8-7.el7.x86_64
qemu-kvm-rhev-2.1.2-10.el7.x86_64
kernel-3.10.0-205.el7.x86_64

Scenario 1: report reason of BLOCK_IO_ERROR while unplug the device
1. start a guest with usb disk.
# virsh dumpxml rhel7|grep disk  -A5
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/sdc'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
    </disk>
# virsh start rhel7
Domain rhel7 started

2. do the following dd operation inside guest.
# dd if=/dev/urandom of=/dev/vdb bs=1M
3. unplug usb stick from host.

4. there is ""reason": "Input/output error"" following BLOCK_IO_ERROR in libvirtd.log:
2014-11-20 12:28:24.613+0000: 2245: debug : qemuMonitorIOProcess:399 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc168001570 buf={"timestamp": {"seconds": 1416486504, "microseconds": 612675}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "report"}}^M
{"timestamp": {"seconds": 1416486504, "microseconds": 612796}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "report"}}^M
{"timestamp": {"seconds": 1416486504, "microseconds": 612889}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "report"}}^M
{"timestamp": {"seconds": 1416486504 len=1023


Scenario 2: report reason of BLOCK_IO_ERROR while no space
1. create one lvm which is 500M.
  LV Path                /dev/vg_flang/lv_flang
  LV Name                lv_flang
  VG Name                vg_flang
  LV UUID                NhJ6o5-B49l-tqjj-3Hsa-XaNj-jA8T-dhijel
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2014-11-21 15:59:51 +0800
  LV Status              available
  # open                 0
  LV Size                500.00 MiB
  Current LE             125
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:3

2. new one guest with this lvm, please choose the disk type as qcow2, the disk dumpxml should be like following one:

<disk type='block' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native'/>
      <source dev='/dev/vg_flang/lv_flang'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>

3. guest will be paused, error ""reason": "No space left on device"" will be get in libvirtd.log as expected.
2014-11-21 08:37:19.263+0000: 7613: debug : qemuMonitorIOProcess:399 : QEMU_MONITOR_IO_PROCESS: mon=0x7f4af00095b0 buf={"timestamp": {"seconds": 1416559039, "microseconds": 262858}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "No space left on device", "errno": 28}, "nospace": true, "__com.redhat_reason": "enospc", "reason": "No space left on device", "operation": "write", "action": "stop"}}

Comment 6 errata-xmlrpc 2015-03-05 07:41:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html