Bug 624607
Summary: | [qemu] [rhel6] guest installation stop (pause) on 'eother' event over COW disks (thin-provisioning) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | |
Component: | qemu-kvm | Assignee: | Luiz Capitulino <lcapitulino> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 6.0 | CC: | antillon.maurizio, armbru, danken, ehabkost, hateya, kgrainge, kwolf, llim, mgoldboi, mkenneth, szhou, tburke, virt-maint, xtian, yeylon | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.119.el6 | Doc Type: | Bug Fix | |
Doc Text: |
IMPORTANT: this is an internal interface consumed only by libvirt. Users should only know about libvirt related impact and new functionality (which is not described here).
Cause: The BLOCK_IO_ERROR event provides limited error information.
Consequence: Debugging of I/O related errors is limited.
Change: Add more information to the BLOCK_IO_ERROR event.
Result: It's now easier to debug I/O related errors.
|
Story Points: | --- | |
Clone Of: | ||||
: | QMPBlockError (view as bug list) | Environment: | ||
Last Closed: | 2011-05-19 11:29:41 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 559201, 580954, 643019 |
Description
Haim
2010-08-17 08:29:49 UTC
We report EPERM, this is probably EACCESS. Yes, the cause was an EACCES. We have two possible solutions for this one: 1. Wait for the new error framework upstream. This is the Right Thing, as the new framework is going to allow for inclusion of error objects in events, which is the right fix for this BZ. The problem is that the work on the new framework has barely started and we don't know how long it will take and how hard it's going to be to backport it 2. Just extend the rhel6's vendor extension. This is easy to do, but we obviously can't go too far, as this solution doesn't escalate and that interface doesn't and won't exist upstream either So, maybe we can try to do 1, if it fails we do 2. In any case I believe libvirt will have to be updated too. This is mostly a debugging feature, so it would already be very helpful to have the concrete errno value in the QMP message even if libvirt doesn't use it (bonus points for an additional human readable error message via strerror). When given a stopped VM that has failed, I can easily detach libvirt/VDSM and attach manually with netcat to the QMP socket. Actually, QE was able to do that themselves and provide me with the QMP error message. However, in the reported cases with 6.0, I had to log in on that machine myself, install debuginfos, attach gdb and set a breakpoint at the right place, continue the VM and catch the next error this way, just the find the error number somewhere in the backtrace. Of course, this would still be a workaround until the Right Thing is available, but it should be very easy to implement. (In reply to comment #10) > This is mostly a debugging feature, so it would already be very helpful to have > the concrete errno value in the QMP message even if libvirt doesn't use it > (bonus points for an additional human readable error message via strerror). > When given a stopped VM that has failed, I can easily detach libvirt/VDSM and > attach manually with netcat to the QMP socket. Actually, QE was able to do that > themselves and provide me with the QMP error message. > > However, in the reported cases with 6.0, I had to log in on that machine > myself, install debuginfos, attach gdb and set a breakpoint at the right place, > continue the VM and catch the next error this way, just the find the error > number somewhere in the backtrace. Ouch. Does it happen often? If it does, I'll consider fixing it for the Z stream. Not too often, I did it this way like three or four times. I think having it in 6.1 would be good enough. (In reply to comment #10) > This is mostly a debugging feature, so it would already be very helpful to have > the concrete errno value in the QMP message even if libvirt doesn't use it > (bonus points for an additional human readable error message via strerror). Let me confirm I got this right. Adding the errno value in the QMP event and a human message to stderr would be enough to solve this issue for rhel6.1? I'm asking because that solution is unlikely to be visible to regular users, IOW, regular users are going to see what's reported by libvirt, like what is described in the original report. However, it's very unlikely we'll get the Right Thing in time. Just want to be sure we're on the same page. I think we are on same page. Of course, I'd be happy to see the Right Thing with integration in libvirt and VDSM, but I understand that this won't be ready for 6.1. So I think it would be already a major improvement if attaching to QMP manually would be enough, so that you wouldn't need to use gdb to debug problems. Changed the 'version' field by accident, change it back to 6.0 and update the correct field (which is 'Target Release'). Reproduce this bug on 113. {"timestamp": {"seconds": 1293087190, "microseconds": 26525}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_reason": "eperm", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1293088121, "microseconds": 201391}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_reason": "enospc", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1293093292, "microseconds": 341860}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_reason": "eio", "operation": "write", "action": "stop"}} And verify this bug on qemu-kvm-0.12.1.2-2.128.el6.x86_64. (qemu) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) {"timestamp": {"seconds": 1293093630, "microseconds": 236418}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Operation not permitted", "errno": 1}, "__com.redhat_reason": "eperm", "operation": "write", "action": "stop"}} (qemu)block I/O error in device 'drive-virtio-disk0': No space left on device (28) {"timestamp": {"seconds": 1293094234, "microseconds": 457275}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "No space left on device", "errno": 28}, "__com.redhat_reason": "enospc", "operation": "write", "action": "stop"}} (qemu) block I/O error in device 'drive-virtio-disk0': Input/output error (5) {"timestamp": {"seconds": 1293093711, "microseconds": 499425}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "__com.redhat_reason": "eio", "operation": "write", "action": "stop"}} From above monitor and qmp message, debug ability of the BLOCK_IO_ERROR event has improved. verified on: vdsm-4.9-34.el6.x86_64 libvirt-0.8.6-1.el6.x86_64 qemu-kvm-0.12.1.2-2.113.el6_0.3.x86_64 installed fresh operating system on 4G cow disk, followed the log and saw that lvextend was initiated when high water mark reached, and disk was extened from 0.5 to 2G. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: IMPORTANT: this is an internal interface consumed only by libvirt. Users should only know about libvirt related impact and new functionality (which is not described here). Cause: The BLOCK_IO_ERROR event provides limited error information. Consequence: Debugging of I/O related errors is limited. Change: Add more information to the BLOCK_IO_ERROR event. Result: It's now easier to debug I/O related errors. Hi Luiz, does this sound about right for the external errata text? When starting a virtual machine that uses thin-provisioning (COW) disk, QEMU would fail to connect to the virtual I/O disk and the virtual machine would go into the pause state without returning much error information. QEMU now returns more verbose error information to help you debug any I/O-related errors. Hi Kate, I think I would change 'QEMU would fail' by 'QEMU could fail' or 'in an I/O failure scenario'..., otherwise the text is good. To be honest, I'm not 100% sure it makes sense to report this change to users, but it won't hurt either. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html |