Bug 1566153
Summary: | IOERROR pause code lost after resuming a VM while I/O error is still present | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Milan Zamazal <mzamazal> | ||||||
Component: | qemu-kvm-rhev | Assignee: | Markus Armbruster <armbru> | ||||||
Status: | CLOSED ERRATA | QA Contact: | CongLi <coli> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.5 | CC: | aliang, chayang, chhu, coli, dyuan, jdenemar, jherrman, jiyan, jsuchane, juzhang, knoel, lmen, michal.skrivanek, michen, mtessun, ngu, virt-maint, xuwei, xuzhang, yanqzhan, yhong | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-2.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Under certain circumstances, resuming a paused guest generated redundant "VIR_DOMAIN_PAUSED_UNKNOWN" error messages in the libvirt log. This update corrects the event sending order when resuming guests, which prevents the errors being logged.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1569614 1582122 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-11-01 11:07:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1526025, 1582122 | ||||||||
Attachments: |
|
This issue can be reproduced: pkg version: libvirt-3.9.0-14.el7_5.2.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64 Steps: 1. Run a VM with a disk attached. 2. Cause an I/O error by overwriting on the diskļ¼ # dd if=/dev/zero of=/mnt/test.img bs=1M count=50 3. Wait until the VM gets paused. 4. Resume the VM. 5. Check event and libvirtd.log (refer to attachment:"QeLogs_BugIoerror") Results: 1. Issue reproduces: # grep ' pause' libvirtd-ioerror.log 2018-04-12 07:20:47.791+0000: 6083: debug : qemuProcessHandleIOError:971 : Transitioned guest V to paused state due to IO error 2018-04-12 07:21:48.272+0000: 6083: debug : qemuProcessHandleResume:808 : Transitioned guest V out of paused into resumed state 2018-04-12 07:21:48.272+0000: 6083: debug : qemuProcessHandleStop:756 : Transitioned guest V to paused state, reason unknown 2018-04-12 07:25:32.687+0000: 6083: debug : qemuProcessHandleResume:808 : Transitioned guest V out of paused into resumed state 2.From following content, events in order are: "BLOCK_IO_ERROR"-"RESUME"-"STOP"-"SHUTDOWN"-"RESUME". Qemu didn't send ioerror msg again when the second paused("STOP"), so seems libvirt can not do something to highlight the ioerror or its reason. # grep QEMU_MONITOR_RECV_EVENT libvirtd-ioerror.log 2018-04-12 07:20:47.791+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9928000fb0 event={"timestamp": {"seconds": 1523517647, "microseconds": 791396}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block368", "reason": "No space left on device", "operation": "write", "action": "stop"}} ... ... 2018-04-12 07:21:48.272+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9928000fb0 event={"timestamp": {"seconds": 1523517708, "microseconds": 250598}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block368", "reason": "No space left on device", "operation": "write", "action": "stop"}} 2018-04-12 07:21:48.272+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9928000fb0 event={"timestamp": {"seconds": 1523517708, "microseconds": 255772}, "event": "RESUME"} 2018-04-12 07:21:48.272+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9928000fb0 event={"timestamp": {"seconds": 1523517708, "microseconds": 259194}, "event": "STOP"} 2018-04-12 07:25:29.118+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f9928000fb0 event={"timestamp": {"seconds": 1523517929, "microseconds": 118290}, "event": "SHUTDOWN", "data": {"guest": false}} 2018-04-12 07:25:32.687+0000: 6083: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f997c0162a0 event={"timestamp": {"seconds": 1523517932, "microseconds": 687012}, "event": "RESUME"} Created attachment 1420763 [details]
QeLogs_BugIoerror
Right, we call "cont" and get several BLOCK_IO_ERROR events followed by RESUME event, successful return from "cont", and STOP event. 2018-04-11 14:52:43.918+0000: 1628: info : qemuMonitorIOWrite:543 : QEMU_MONITOR_IO_WRITE: mon=0x7fc76401a810 buf={"execute":"cont","id":"libvirt-2482"} 2018-04-11 14:52:43.922+0000: 1628: info : qemuMonitorIOProcess:438 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc76401a810 buf={"timestamp": {"seconds": 1523458363, "microseconds": 922090}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ua-29ea81dc-a9a1-4f41-83ad-7715422e1079", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block109", "reason": "Input/output error", "operation": "read", "action": "stop"}} 2018-04-11 14:52:43.923+0000: 1628: info : qemuMonitorIOProcess:438 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc76401a810 buf={"timestamp": {"seconds": 1523458363, "microseconds": 922346}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ua-29ea81dc-a9a1-4f41-83ad-7715422e1079", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block109", "reason": "Input/output error", "operation": "write", "action": "stop"}} 2018-04-11 14:52:43.928+0000: 1628: info : qemuMonitorIOProcess:438 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc76401a810 buf={"timestamp": {"seconds": 1523458363, "microseconds": 928242}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-ua-29ea81dc-a9a1-4f41-83ad-7715422e1079", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block109", "reason": "Input/output error", "operation": "write", "action": "stop"}} 2018-04-11 14:52:43.942+0000: 1628: info : qemuMonitorIOProcess:438 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc76401a810 buf={"timestamp": {"seconds": 1523458363, "microseconds": 942444}, "event": "RESUME"} {"return": {}, "id": "libvirt-2482"} 2018-04-11 14:52:43.951+0000: 1628: info : qemuMonitorIOProcess:438 : QEMU_MONITOR_IO_PROCESS: mon=0x7fc76401a810 buf={"timestamp": {"seconds": 1523458363, "microseconds": 951301}, "event": "STOP"} I'm moving this to QEMU for further investigation. I asked yanqzhan for a more detailed reproducer, and got this in a pastebin: [io-error] Deliberately leading to domain I/O errors Setup: Prepare a small partition for this tesing. for example /dev/sda3 11M # mkfs.ext3 /dev/sda11 1.define and start a pool: # cat fs-pool.xml <pool type='fs'> <name>mypool</name> <source> <device path='/dev/sda3'/> <format type='auto'/> </source> <target> <path>/var/lib/libvirt/images/mypool</path> </target> </pool> # virsh pool-define fs-pool.xml # virsh pool-build mypool # virsh pool-start mypool Check the pool is working fine. # df -h 2.Create volume in the pool. # cat vol-disk-template.xml <volume> <name>disk1.img</name> <capacity unit='M'>100</capacity> <allocation unit='M'>0</allocation> <target> <path>/var/lib/libvirt/images/mypool/disk1.img</path> <format type='raw'/> </target> </volume> # virsh vol-create mypool vol-disk-template.xml 3.Add the volume to an guest as secondary disk, then start the guest. # virsh edit testvm <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mypool/disk1.img'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> # virsh start testvm 4. In guest, try to write some stuff (which size is bigger than 50M ) in the secondary disk. # fdisk /dev/vdb In fdisk prompt, print "n, p, 1, enter, enter, w" in order. (it will be the vda if the first disk is not virtio driver, use "# fdisk -l" to determine the 2rd virtual disk for the guest) # mkfs.ext3 /dev/vdb1 # mount /dev/vdb1 /mnt # dd if=/dev/zero of=/mnt/test.img bs=1M count=50 5.On host, check the partion status: 100% used. # df -h|grep sda3 (note: can delete vol then reuse this partion.) Simplified reproducer with the upper layers peeled off: 1. Create a scratch image $ dd if=/dev/zero of=scratch.img bs=1M count=100 Size doesn't actually matter. 2. Prepare blkdebug configuration: $ cat >blkdebug.conf <<EOF [inject-error] event = "write_aio" errno = "5" EOF Note that errno 5 is EIO. 3. Run a guest with an additional scratch disk, i.e. with additional arguments -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img -device virtio-blk-pci,id=scratch,drive=scratch-drive The blkdebug part makes all writes to the scratch drive fail with EIO. The werror=stop pauses the guest on write errors. 4. Connect to the QMP socket e.g. like this: $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> ' Issue QMP command 'qmp_capabilities': QMP> { "execute": "qmp_capabilities" } 5. Boot the guest. 6. In the guest, write to the scratch disk, e.g. like this: # dd if=/dev/zero of=/dev/vdb count=1 Do double-check the device specified with of= is actually the scratch device! 7. Issue QMP command 'cont': QMP> { "execute": "cont" } Actual results for upstream QEMU 2.12.0-rc3: * After step 6, I get these QMP events: {"timestamp": {"seconds": 1523950690, "microseconds": 13256}, "event": "BLOCK_IO_ERROR", "data": {"device": "scratch-drive", "nospace": false, "node-name": "#block634", "reason": "Input/output error", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1523950690, "microseconds": 14614}, "event": "STOP"} * After step 7, I get {"timestamp": {"seconds": 1523950730, "microseconds": 551845}, "event": "BLOCK_IO_ERROR", "data": {"device": "scratch-drive", "nospace": false, "node-name": "#block634", "reason": "Input/output error", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1523950730, "microseconds": 553629}, "event": "RESUME"} {"return": {}} {"timestamp": {"seconds": 1523950730, "microseconds": 555970}, "event": "STOP"} Events BLOCK_IO_ERROR and RESUME before the reply to "cont", event "STOP" afterwards. This is consistent with what I see in the libvirt logs from comment#3, and the log excerpt in comment#4. The BLOCK_IO_ERROR event "overtakes" the RESUME event here. This matches the order of libvirt events observed by the reporter: IO_ERROR, RESUME, PAUSED. The reporter further observed that virsh domstate --reason reports "unknown" after resume. Note that QEMU's BLOCK_IO_ERROR event still carries the correct "reason" then. Looks like it gets lost on the way to virsh domstate, possibly because libvirt is not prepared for this order of QEMU events. I think we got two possible bugs here: #1 qemu-kvm bug: event ordering I'm not saying this is a bug in need of fixing. I am saying we should find out whether it is one. #2 libvirt bug: failure to cope with said event ordering Even if #1 is deemed a genuine bug, libvirt may want to cope with unfixed QEMU versions. Jiri, what do you think? #1 The ordering is pretty weird, IMHO it would be more logical to fail the "cont" command, but I'm afraid this won't be possible because it's completely asynchronous. #2 Libvirt reacts on each event separately. When it sees BLOCK_IO_ERROR and the configuration says it should stop guest CPUs in such case, the domain is marked as paused because of IO error. Then it sees RESUME and marks the domain as running. Once STOP arrives, the domain is switched to paused with unknown reason. We could theoretically remember the last IO error and consider it valid later when we see STOP, but it sounds too hacky and fragile. Or we could just call query-status when we see the STOP event because it is able to give us the details we need to set the appropriate stop reason. This a bit better, until we start thinking about it more. Libvirt would report paused/unknown for the time frame between the STOP event and the reply from query-status as we'd have to schedule this command to be processed asynchronously as another synchronous command may be running at the time. So ideally, we'd just get the reason with the STOP event itself so that we don't have to query it explicitly. Unless it's easy to fix the order of events, of course. I'm going to clone this bug for tracking possible libvirt work (comment#8 item #2). This bug will be for tracking qemu-kvm work (item #1). Reproduced with latest upstream. The culprit is vm_prepare_start(). /* Ensure that a STOP/RESUME pair of events is emitted if a * vmstop request was pending. The BLOCK_IO_ERROR event, for * example, according to documentation is always followed by * the STOP event. */ if (runstate_is_running()) { qapi_event_send_stop(&error_abort); res = -1; } else { replay_enable_events(); cpu_enable_ticks(); runstate_set(RUN_STATE_RUNNING); vm_state_notify(1, RUN_STATE_RUNNING); } /* We are sending this now, but the CPUs will be resumed shortly later */ qapi_event_send_resume(&error_abort); return res; When resuming a stopped guest, we take the else branch before we get to sending RESUME. vm_state_notify() runs virtio_vmstate_change(), among other things. This restarts I/O, triggering the BLOCK_IO_ERROR event. Proposed upstream patch Message-Id: <20180423084518.2426-1-armbru> https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03639.html Fix included in qemu-kvm-rhev-2.12.0-2.el7 (In reply to Markus Armbruster from comment #8) > Simplified reproducer with the upper layers peeled off: > > 1. Create a scratch image > $ dd if=/dev/zero of=scratch.img bs=1M count=100 > > Size doesn't actually matter. > > 2. Prepare blkdebug configuration: > > $ cat >blkdebug.conf <<EOF > [inject-error] > event = "write_aio" > errno = "5" > EOF > > Note that errno 5 is EIO. > > 3. Run a guest with an additional scratch disk, i.e. with additional > arguments > -drive > if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf: > scratch.img > -device virtio-blk-pci,id=scratch,drive=scratch-drive > > The blkdebug part makes all writes to the scratch drive fail with > EIO. The werror=stop pauses the guest on write errors. > > 4. Connect to the QMP socket e.g. like this: > $ socat UNIX:/your/qmp/socket > READLINE,history=$HOME/.qmp_history,prompt='QMP> ' > > Issue QMP command 'qmp_capabilities': > QMP> { "execute": "qmp_capabilities" } > > 5. Boot the guest. > > 6. In the guest, write to the scratch disk, e.g. like this: > > # dd if=/dev/zero of=/dev/vdb count=1 > > Do double-check the device specified with of= is actually the > scratch device! > > 7. Issue QMP command 'cont': > QMP> { "execute": "cont" } > Verified this bug on qemu-kvm-rhev-2.12.0-2.el7.x86_64: After step 6: {"timestamp": {"seconds": 1526610722, "microseconds": 326202}, "event": "BLOCK_IO_ERROR", "data": {"device": "scratch-drive", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block421", "reason": "Input/output error", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1526610722, "microseconds": 335677}, "event": "STOP"} After step 7: {"timestamp": {"seconds": 1526610729, "microseconds": 156841}, "event": "RESUME"} {"timestamp": {"seconds": 1526610729, "microseconds": 157809}, "event": "BLOCK_IO_ERROR", "data": {"device": "scratch-drive", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block421", "reason": "Input/output error", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1526610729, "microseconds": 158925}, "event": "STOP"} The 'RESUME' event triggered before 'BLOCK_IO_ERROR' event, the event ordering 'RESUME' -> 'BLOCK_IO_ERROR' -> 'STOP' is triggered as expected. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |
Created attachment 1420445 [details] libvirt log Description of problem: When a VM is paused due to an erroneous storage, libvirt emits a corresponding life cycle event with VIR_DOMAIN_PAUSED_IOERROR reason and then VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON event. Also the state of the VM is set appropriately: # virsh -r domstate 2 --reason paused (I/O error) When the VM is then resumed manually while the I/O error still persists, it gets paused again immediately. However in that case the life cycle event contains VIR_DOMAIN_PAUSED_UNKNOWN reason. I/O error is also no longer reported when asking for the VM state: # virsh -r domstate 2 --reason paused (unknown) Additionally, the order of incoming events is weird, as follows: - IO_ERROR event - RESUME event - PAUSED event That means the real pause reason is lost. Version-Release number of selected component (if applicable): libvirt-3.9.0-14.el7_5.2.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64 kernel-3.10.0-862.el7.x86_64 How reproducible: 100% reproducible on some setups. Not reproducible on other setups. Steps to Reproduce: 1. Run a VM with a disk attached. 2. Cause an I/O error on the disk, such as blocking an iSCSI disk with iptables. 3. Wait until the VM gets paused. 4. Resume the VM. Actual results: The VM gets paused with UNKNOWN pause reason. The order of events is IOERROR - RESUME - PAUSED. Expected results: The VM gets paused with IOERROR pause reason. The order of events is RESUME - PAUSED - IOERROR or RESUME - IOERROR - PAUSED. Additional info: libvirtd.log is attached.