Bug 851858

Summary: qemu-img: cannot resume a vm that was paused due to EIO on NFS storage although storage is available
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: qemu-kvmAssignee: Asias He <asias>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, areis, bsarathy, chayang, dyasny, hateya, iheim, juzhang, mkenneth, tburke, virt-maint, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-15 16:07:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-08-26 15:23:57 UTC
Created attachment 607072 [details]
logs

Description of problem:

in a two host cluster with NFS storage, I blocked the storage from the host using iptables. 
after the vm's paused I removed the block -> activated the hosts and once the storage was active I selected all my 10 vms and ran them. 
one of the vm's refuses to start due to EIO even though the domain is available and all other vm's have started. 
Iv'e reproduced this several times with the same vm's and each time a different vm pauses and refuses to start.

Version-Release number of selected component (if applicable):

qemu-img-rhev-0.12.1.2-2.298.el6_3.x86_64
libvirt-0.9.10-21.el6.x86_64
vdsm-4.9.6-30.0.el6_3.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster with NFS storage -> run vm's with XP installed + writing 
2. block connectivity to the storage domain from both hosts
3. when vm's pause remove the iptables rule -> select all vm's and run
  
Actual results:

all but one of the vm's is resumed
when I try to run the vm again it keeps getting EIO errors and pauses. 

Expected results:

we should be able to resume all vm's

Additional info: libvirt, vdsm and 2 vm's logs (one is XP-6 that has the issue and one is XP-10 which ran). 

[root@gold-vdsd tmp]# vdsClient -s 0 continue 553cd58e-2295-4995-a4ee-71724f63ee49
	code = 0
	message = Done
[root@gold-vdsd tmp]# vdsClient -s 0 list table
63205116-5547-4e7c-b89f-c6cf8502f09d  23829  XP-10                Up                                       
2c78a0af-9e68-4e3b-a8f7-93346d19c3c9  24042  XP-8                 Up                                       
29ce48d2-966c-447b-809e-ae26303be112  25350  XP-5                 Up                                       
553cd58e-2295-4995-a4ee-71724f63ee49  23984  XP-6                 Paused                                   
50737895-2cee-42aa-8aaf-734e7891a99b  25423  XP-9                 Up                                       
985d5a5b-41ed-4b51-8f02-6886a4e3b223  24082  XP-7                 Up                                       
68640442-defe-4186-a67a-974fa33dfcf5  23488  XP-3                 Up                                       
7c4ee4f9-31bf-4dcd-8ca3-57d3988a1bbf  23787  XP-4                 Up                                       
1845bf08-b103-421a-aeb6-127d22486e30  23684  XP-2                 Up                                       
fc0643e6-dddc-4662-b3a3-a8b3b27924fd  23189  XP-1                 Up       

[root@gold-vdsd tmp]# virsh -r list
 Id    Name                           State
----------------------------------------------------
 71    XP-1                           running
 72    XP-3                           running
 73    XP-2                           running
 74    XP-4                           running
 75    XP-10                          running
 76    XP-6                           paused
 77    XP-8                           running
 78    XP-7                           running
 79    XP-5                           running
 80    XP-9                           running


-bash-4.1$ qemu-img info  /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
image: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
file format: qcow2
virtual size: 15G (16106127360 bytes)
disk size: 334M
cluster_size: 65536
backing file: ../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929 (actual path: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929)

bash-4.1$ qemu-img check  /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
No errors were found on the image.

Comment 2 Chao Yang 2012-08-27 07:19:51 UTC
FYI:
Bug 740509 - cannot resume vm's that were paused due to disconnection to SD in NFS storage type

Comment 3 Dor Laor 2012-09-02 14:58:00 UTC
(In reply to comment #2)
> FYI:
> Bug 740509 - cannot resume vm's that were paused due to disconnection to SD
> in NFS storage type

Thanks for finding this exact source of the same bug!
Dafna, do you agree to clone this as a duplicate?
Since it's only about windowsXp + IDE + rare case for the storage I rather keep on posting (closing) this case too.
Dor

Comment 4 Dafna Ron 2012-09-02 15:05:07 UTC
sure. it's your call Dor :)

Comment 5 Ademar Reis 2012-10-15 16:07:16 UTC
(In reply to comment #4)
> sure. it's your call Dor :)

Done, thanks.

*** This bug has been marked as a duplicate of bug 740509 ***