1090079 – vdsm reports guest as paused on any IO error, even if libvirt/qemu policy is set to "report"

Bug 1090079 - vdsm reports guest as paused on any IO error, even if libvirt/qemu policy is set to "report"

Summary: vdsm reports guest as paused on any IO error, even if libvirt/qemu policy is ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	3.3.3
Assignee:	Francesco Romani
QA Contact:	Pavel Novotny
Docs Contact:
URL:
Whiteboard:	virt
Depends On:	1064630
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-22 13:58 UTC by rhev-integ
Modified:	2019-08-15 03:52 UTC (History)
CC List:	11 users (show)
Fixed In Version:	vdsm-4.13.2-0.14.el6ev
Doc Type:	Bug Fix
Doc Text:	Previously, VDSM would report that virtual machines experiencing any I/O error were in a paused state. This was caused by the logic used by VDSM to check I/O errors received from libvirt. Now, the logic used to check such errors has been revised so that VDSM detects the nature of the error, allowing I/O errors to be correctly reported and handled.
Clone Of:	1064630
Environment:
Last Closed:	2014-05-27 08:57:34 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0548	normal	SHIPPED_LIVE	vdsm 3.3.3 bug fix update	2014-05-27 12:56:53 UTC
oVirt gerrit	25157	None	None	None	Never
oVirt gerrit	26023	None	None	None	Never
oVirt gerrit	27016	None	None	None	Never

Comment 3 Pavel Novotny 2014-05-05 14:54:40 UTC

Verified in vdsm-4.13.2-0.14.el6ev.x86_64 (is36).

Verification steps:

I used local storages on the host.
One "regular" under `/mnt/localstorage/`.
And a second "flakey" one under `/mnt/errstorage/`, simulating I/O errors using `dmseup` utility:
-~-
# dd if=/dev/zero of=/tmp/virtualblock.img bs=4096 count=1M
1048576+0 records in
1048576+0 records out
4294967296 bytes (4,3 GB) copied, 50,7873 s, 84,6 MB/s
# losetup /dev/loop7 /tmp/virtualblock.img 
# mkfs.ext4 /dev/loop7 
mke2fs 1.41.12 (17-May-2010)
Discarding device blocks: done                            
...
...
### following command creates a flakey device with random I/O errors
# dmsetup create errdev0
0 8388608 flakey /dev/loop7 0 9 1
# mkdir /mnt/errstorage
# chown -R vdsm:kvm /mnt/errstorage
# mount /dev/mapper/errdev0 /mnt/errstorage/
-~-

In RHEVM GUI, add both local storages (/mnt/localstorage as master SD).
Create new VM with two disks - one "healthy" disk on the `localstorage` domain and a second "flakey" disk (1G) on the `errorstorage` domain.

In RHEVM DB, update both disks to propagate errors to guest: psql: UPDATE base_disks SET propagate_errors = 'On';
Restart ovirt-engine service.

In RHEVM GUI, install the guest OS *on the healthy disk* (I used Fedora 19).
In the guest, mount the second flakey disk to `/mnt/errdisk/` and run some I/O operation on it.
I used `dd`: # dd if=/dev/zero of=/mnt/errdisk/test bs=1000 mount=1M
and after few seconds I got a splash of I/O errors "Buffer I/O error on device vdb, logical block ...".

Results:
The qemu process runs with correct parameter 'werror=enospc'.
After the I/O errors, the guest is still running.
Both, QEMU/VDSM and RHEVM, are also reporting the guest as running.

Comment 5 errata-xmlrpc 2014-05-27 08:57:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0548.html

Note You need to log in before you can comment on or make changes to this bug.