1005654 – two VM has been paused due to a storage IO error

Bug 1005654 - two VM has been paused due to a storage IO error

Summary: two VM has been paused due to a storage IO error

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.2.0
Hardware:	x86_64
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Nobody
QA Contact:
Docs Contact:
URL:
Whiteboard:	storage
Duplicates (1):	1005652 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-09 05:32 UTC by lijin
Modified:	2016-02-10 19:41 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-12-03 16:07:26 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
log file (327.60 KB, application/zip) 2013-09-09 05:32 UTC, lijin	no flags	Details
new-log files (1.86 MB, application/zip) 2013-09-24 08:41 UTC, lijin	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	624607	0	medium	CLOSED	[qemu] [rhel6] guest installation stop (pause) on 'eother' event over COW disks (thin-provisioning)	2021-02-22 00:41:40 UTC

Internal Links: 624607

Description lijin 2013-09-09 05:32:26 UTC

Created attachment 795491 [details]
log file

Description of problem:
run serveral windows guest in rhevm 3.2 and run iozone for a few days.
two guest(win7-64,win8-64) were paused after runnning iozone for 5 days

Version-Release number of selected component (if applicable):
RHEV-toolsSetup_3.2_13.iso
virtio-win-prewhql-68
qemu-kvm-rhev-0.12.1.2-2.375.el6.x86_64
kernel-2.6.32-369.el6.x86_64
vdsm-4.10.2-22.0.el6ev.x86_64
libvirt-0.10.2-18.el6_4.8.x86_64

How reproducible:
once in two weeks

Steps to Reproduce:
1.boot 7 different windows guests include win7-64 and win8-64 in rhevm 3.2
2.run iozone on system disk in each guest for more than 5 days:
  the iozone command:iozone.exe -az -b c:\aaaa -g 30g -y 64k

Actual results:
win7-64 and win8-64 guests were pause,and rhevm has a warning message as "VM has been paused due to a storage IO error".
There is several messages in /var/log/libvirt/qemu/w8-64-longevity.log and w7-64-longevity.log as "block I/O error in device 'drive-virtio-disk0': Input/output error (5)"

Expected results:
guest works fine,no pause 

Additional info:
The attachment is some log files,if need anything,please let me know
The rhevm command:
/usr/libexec/qemu-kvm -name w7-64-longevity -S -M rhel6.4.0 -cpu Penryn -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 3739b388-9179-4080-a755-51e8b48bdfdc -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=44454C4C-3100-1057-8054-B3C04F573258,uuid=3739b388-9179-4080-a755-51e8b48bdfdc -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/w7-64-longevity.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-09-02T06:19:13,driftfix=slew -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/rhev/data-center/mnt/10.66.90.115:_vol_s2bcao228623_isopool/87fa60aa-1dac-45c0-b0ca-66f8b9841d9f/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.2_13.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/fab0800a-b0fb-462c-821a-07b75eac0a03/3d2facdc-1605-4ccc-8f39-e2af02c77258/images/314979f2-27d7-4af8-b604-2ed8d24147d8/588501ce-4682-47ee-bdd1-c1b81d1d0ee1,if=none,id=drive-virtio-disk0,format=raw,serial=314979f2-27d7-4af8-b604-2ed8d24147d8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:08:97,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/w7-64-longevity.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/w7-64-longevity.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5904,tls-port=5905,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -incoming tcp:[::]:49153 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Comment 1 Itamar Heim 2013-09-09 15:05:55 UTC

*** Bug 1005652 has been marked as a duplicate of this bug. ***

Comment 2 Ronen Hod 2013-09-10 12:10:03 UTC

See also https://bugzilla.redhat.com/show_bug.cgi?id=624607#c3

Comment 5 Ayal Baron 2013-09-24 07:38:03 UTC

The attached vdsm log is not from the correct time, please look for 'abnormal' in the vdsm logs, find the time corresponding to the problem, attach this vdsm log and the libvirt log from the same time.
Thanks.

Comment 6 Ayal Baron 2013-09-24 07:40:20 UTC

(In reply to Ayal Baron from comment #5)
> The attached vdsm log is not from the correct time, please look for
> 'abnormal' in the vdsm logs, find the time corresponding to the problem,
> attach this vdsm log and the libvirt log from the same time.
> Thanks.

Please also attach /var/log/messages for relevant times

Comment 8 lijin 2013-09-24 08:41:15 UTC

Created attachment 802098 [details]
new-log files

Comment 9 Ayal Baron 2013-09-24 09:30:19 UTC

Your NFS server is acting up:

Sep  9 00:10:50 localhost kernel: nfs: server 10.66.90.115 not responding, timed out

this is a storage issue.

Note You need to log in before you can comment on or make changes to this bug.