Bug 1005654

Summary: two VM has been paused due to a storage IO error
Product: Red Hat Enterprise Virtualization Manager Reporter: lijin <lijin>
Component: vdsmAssignee: Nobody <nobody>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.2.0CC: abaron, acanan, amureini, bazulay, bcao, hateya, iheim, kwolf, lijin, lpeer, nobody, rhod, stefanha, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.4.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-03 16:07:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
log file
none
new-log files none

Description lijin 2013-09-09 05:32:26 UTC
Created attachment 795491 [details]
log file

Description of problem:
run serveral windows guest in rhevm 3.2 and run iozone for a few days.
two guest(win7-64,win8-64) were paused after runnning iozone for 5 days

Version-Release number of selected component (if applicable):
RHEV-toolsSetup_3.2_13.iso
virtio-win-prewhql-68
qemu-kvm-rhev-0.12.1.2-2.375.el6.x86_64
kernel-2.6.32-369.el6.x86_64
vdsm-4.10.2-22.0.el6ev.x86_64
libvirt-0.10.2-18.el6_4.8.x86_64

How reproducible:
once in two weeks

Steps to Reproduce:
1.boot 7 different windows guests include win7-64 and win8-64 in rhevm 3.2
2.run iozone on system disk in each guest for more than 5 days:
  the iozone command:iozone.exe -az -b c:\aaaa -g 30g -y 64k

Actual results:
win7-64 and win8-64 guests were pause,and rhevm has a warning message as "VM has been paused due to a storage IO error".
There is several messages in /var/log/libvirt/qemu/w8-64-longevity.log and w7-64-longevity.log as "block I/O error in device 'drive-virtio-disk0': Input/output error (5)"

Expected results:
guest works fine,no pause 

Additional info:
The attachment is some log files,if need anything,please let me know
The rhevm command:
/usr/libexec/qemu-kvm -name w7-64-longevity -S -M rhel6.4.0 -cpu Penryn -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -uuid 3739b388-9179-4080-a755-51e8b48bdfdc -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=6Server-6.4.0.4.el6,serial=44454C4C-3100-1057-8054-B3C04F573258,uuid=3739b388-9179-4080-a755-51e8b48bdfdc -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/w7-64-longevity.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2013-09-02T06:19:13,driftfix=slew -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/rhev/data-center/mnt/10.66.90.115:_vol_s2bcao228623_isopool/87fa60aa-1dac-45c0-b0ca-66f8b9841d9f/images/11111111-1111-1111-1111-111111111111/RHEV-toolsSetup_3.2_13.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/rhev/data-center/fab0800a-b0fb-462c-821a-07b75eac0a03/3d2facdc-1605-4ccc-8f39-e2af02c77258/images/314979f2-27d7-4af8-b604-2ed8d24147d8/588501ce-4682-47ee-bdd1-c1b81d1d0ee1,if=none,id=drive-virtio-disk0,format=raw,serial=314979f2-27d7-4af8-b604-2ed8d24147d8,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:08:97,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/w7-64-longevity.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/w7-64-longevity.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5904,tls-port=5905,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -incoming tcp:[::]:49153 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Comment 1 Itamar Heim 2013-09-09 15:05:55 UTC
*** Bug 1005652 has been marked as a duplicate of this bug. ***

Comment 2 Ronen Hod 2013-09-10 12:10:03 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=624607#c3

Comment 5 Ayal Baron 2013-09-24 07:38:03 UTC
The attached vdsm log is not from the correct time, please look for 'abnormal' in the vdsm logs, find the time corresponding to the problem, attach this vdsm log and the libvirt log from the same time.
Thanks.

Comment 6 Ayal Baron 2013-09-24 07:40:20 UTC
(In reply to Ayal Baron from comment #5)
> The attached vdsm log is not from the correct time, please look for
> 'abnormal' in the vdsm logs, find the time corresponding to the problem,
> attach this vdsm log and the libvirt log from the same time.
> Thanks.

Please also attach /var/log/messages for relevant times

Comment 8 lijin 2013-09-24 08:41:15 UTC
Created attachment 802098 [details]
new-log files

Comment 9 Ayal Baron 2013-09-24 09:30:19 UTC
Your NFS server is acting up:

Sep  9 00:10:50 localhost kernel: nfs: server 10.66.90.115 not responding, timed out

this is a storage issue.