Bug 1117445

Summary: QMP: extend block events with error information
Product: Red Hat Enterprise Linux 7 Reporter: Luiz Capitulino <lcapitulino>
Component: qemu-kvm-rhevAssignee: Luiz Capitulino <lcapitulino>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: armbru, eblake, flang, hhuang, huding, juzhang, knoel, kwolf, lcapitulino, michen, pbonzini, qzhang, rbalakri, shu, sluo, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-2.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1119784 (view as bug list) Environment:
Last Closed: 2015-03-05 09:48:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1119784    

Description Luiz Capitulino 2014-07-08 17:36:33 UTC
In RHEL6, we have extended the BLOCK_IO_ERROR event to contain the following fields:

o __com.redhat_reason: string representing enum value (ie. "eio", "eperm", "enospc" or "eother")
o __com.redhat_debug_info.errno: errno value as an integer
o __com.redhat_debug_info.message: error message returned by strerror()

Since then we have carried those extensions forward:

o RHEL6 (original request): bug 586349 and bug 624607
o RHEL7.0: bug 971938 and bug 895041
o RHEL7.1: bug 1116772

It's time to add this feature upstream, possibly for BLOCK_JOB_ERROR too. I'll add design ideas in the comments.

Comment 2 Luiz Capitulino 2014-07-08 17:52:53 UTC
Here are some design considerations when doing this for upstream:

o We may want the extension(s) in BLOCK_IO_ERROR and BLOCK_JOB_ERROR events
o query-block and query-block-jobs must contain this info too
o The errno integer should be dropped
o Having the error string from strerror() is probably fine
o For the "reason" field, we have two options: a QAPI enum containing the most common errnos; or, instead of having a "reason" at all, we could only distinguish between ENOSPC and all the other errors. Say, having "no-space-error" bool
o If doing the QAPI enum containing the most comman errnos, they we should have a catch-all for unknown errnos (eg. "unknown-error-code")

Here goes an example (taking the QAPI enum as solution for the "reason" field):

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop",
              "error-reason": "eio",
              "error-message": "I/O error" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Comment 3 Markus Armbruster 2014-07-16 14:22:05 UTC
Not relevant for RHEL-7, but might be wanted upstream anyway: make the information added to query-block and query-block visible in info block and info block-jobs.

Comment 4 Luiz Capitulino 2014-07-23 19:57:39 UTC
RFC series posted upstream:

http://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg03235.html

Comment 5 Luiz Capitulino 2014-09-11 13:45:48 UTC
Posted v1 some time ago. It has already been applied in the block tree:

http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg05346.html

Comment 6 Luiz Capitulino 2014-10-09 13:17:03 UTC
This is actually about qemu-kvm-rhev, so fix component.

Comment 10 Miroslav Rezanina 2014-10-10 07:34:11 UTC
Fix included in qemu-kvm-rhev-2.1.2-2.el7

Comment 12 langfang 2014-10-28 08:06:00 UTC
Reproduce this bug as follow version:
Host:
# uname -r
3.10.0-191.el7.x86_64
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.1.0-3.el7.x86_64

Steps:
1.Boot guest with /dev/sdb (a usb storage on host)

2, do the following dd operation inside guest.
# dd if=/dev/urandom of=/dev/vda bs=1M

3, unplug usb storage from host.

Results:
QEMU:
...
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
...

#telnet $ip 4444
...
{"timestamp": {"seconds": 1414478362, "microseconds": 665870}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "__com.redhat_reason": "eio", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1414478362, "microseconds": 665908}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "__com.redhat_reason": "eio", "operation": "write", "action": "stop"}}
....

{"execute":"query-block"}
...
{"io-status": "failed", "device": "drive-virtio-disk1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 15513354240, "filename": "/dev/sdd", "format": "raw", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdd", "encryption_key_missing": false}, "type": "unknown"}...


Test on latest version:

Version:
# uname -r
3.10.0-191.el7.x86_86
# rpm -q qemu-kvm-rhev
qemu-kvm-rhev-2.1.2-5.el7.x86_64

Steps as same as reproduce

Resutls:
QEMU:
...
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
block I/O error in device 'drive-virtio-disk1': Input/output error (5)
...
#telnet $IP 4444
....
{"timestamp": {"seconds": 1414479655, "microseconds": 267117}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1414479655, "microseconds": 267165}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1414479655, "microseconds": 267206}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "stop"}}
....

{"execute":"query-block"}
..
 {"io-status": "failed", "device": "drive-virtio-disk1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 15513354240, "filename": "/dev/sdd", "format": "raw", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdd", "encryption_key_missing": false}, "type": "unknown"}...


Check the results as comment2
o We may want the extension(s) in BLOCK_IO_ERROR and BLOCK_JOB_ERROR events

o query-block and query-block-jobs must contain this info too--->***contain this info,seem get the same info between unfixed version and fixed version use {"execute":"qeuery-block"}--->not fixed

o The errno integer should be dropped--->the errno integer not droped,will see :
"errno": 5---->not fixed

o Having the error string from strerror() is probably fine--->will see: ..."reason": "Input/output error"...--->fixed

o For the "reason" field, we have two options: a QAPI enum containing the most common errnos; or, instead of having a "reason" at all, we could only distinguish between ENOSPC and all the other errors. Say, having "no-space-error" bool---->will see: ..."nospace": false ....--->fixed



Addtional info:

1) Test try to trigger "no space left " error

Results:
(qemu) info status 
VM status: paused (prelaunch)

#telnet $ip 4444
..
{"timestamp": {"seconds": 1414481471, "microseconds": 854961}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk", "__com.redhat_debug_info": {"message": "No space left on device", "errno": 28}, "nospace": true, "__com.redhat_reason": "enospc", "reason": "No space left on device", "operation": "write", "action": "stop"}}

2)Test try to  tigger block job error

Steps
@@@1.Boot guest with usb device
 ...-drive file=/dev/sdd,if=none,id=drive-virtio-disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=on,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=2..

@@@2.
#telnet $ip 4444
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-virtio-disk1","snapshot-file":"/root/sn1","format": "qcow2" } }


@@@3.While block-stream  the usb device, hotunplug the device.
{ "execute": "block-stream", "arguments": { "device": "drive-virtio-disk1"}}


Resutls:
...
{"timestamp": {"seconds": 1414482202, "microseconds": 191604}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-virtio-disk1", "operation": "read", "action": "report"}}--->***seem miss some info (EG: reason, debug info)
..
{"timestamp": {"seconds": 1414482202, "microseconds": 191735}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk1", "len": 15513354240, "offset": 134217728, "speed": 0, "type": "stream", "error": "Input/output error"}}

Hi,Luiz

   Please help me to see the above test , seem not fixed according to comment2.thanks

best regards
fang lang

Comment 13 Luiz Capitulino 2014-10-29 16:57:23 UTC
The design changed on upstream. We ended up merging a simpler implementation which adds keys "nospace" and "reason" only to the BLOCK_IO_ERROR event.

Comment 14 langfang 2014-10-30 01:27:47 UTC
(In reply to Luiz Capitulino from comment #13)
> The design changed on upstream. We ended up merging a simpler implementation
> which adds keys "nospace" and "reason" only to the BLOCK_IO_ERROR event.


Results:

1) "BLOCK_IO_ERROR"
..
{"timestamp": {"seconds": 1414479655, "microseconds": 267117}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "write", "action": "stop"}}
...

{"timestamp": {"seconds": 1414481471, "microseconds": 854961}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk", "__com.redhat_debug_info": {"message": "No space left on device", "errno": 28}, "nospace": true, "__com.redhat_reason": "enospc", "reason": "No space left on device", "operation": "write", "action": "stop"}}

2) "BLOCK_JOB_ERROR"
...
{"timestamp": {"seconds": 1414482202, "microseconds": 191604}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-virtio-disk1", "operation": "read", "action": "report"}}
...

3)"query-block"
....
{"io-status": "failed", "device": "drive-virtio-disk1", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 15513354240, "filename": "/dev/sdd", "format": "raw", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdd", "encryption_key_missing": false}, "type": "unknown"}--->get the same info between unfixed version and fixed version 
...


Hi,Luiz
   thanks for your review,as you said ,are the expect results for this bug? Is there any plan to  fix others in the feature( EG:"BLOCK_JOB_ERROR" info)? thanks

Comment 15 Luiz Capitulino 2014-10-30 14:19:23 UTC
>    thanks for your review,as you said ,are the expect results for this bug?

Yes.

> Is there any plan to  fix others in the feature( EG:"BLOCK_JOB_ERROR" info)?

Not at this moment.

Comment 16 langfang 2014-10-31 03:30:14 UTC
(In reply to Luiz Capitulino from comment #15)
> >    thanks for your review,as you said ,are the expect results for this bug?
> 
> Yes.
> 
> > Is there any plan to  fix others in the feature( EG:"BLOCK_JOB_ERROR" info)?
> 
> Not at this moment.

As comment15, we can verify this bug. thanks

Comment 19 errata-xmlrpc 2015-03-05 09:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html