Created attachment 607072[details]
logs
Description of problem:
in a two host cluster with NFS storage, I blocked the storage from the host using iptables.
after the vm's paused I removed the block -> activated the hosts and once the storage was active I selected all my 10 vms and ran them.
one of the vm's refuses to start due to EIO even though the domain is available and all other vm's have started.
Iv'e reproduced this several times with the same vm's and each time a different vm pauses and refuses to start.
Version-Release number of selected component (if applicable):
qemu-img-rhev-0.12.1.2-2.298.el6_3.x86_64
libvirt-0.9.10-21.el6.x86_64
vdsm-4.9.6-30.0.el6_3.x86_64
How reproducible:
100%
Steps to Reproduce:
1. in two hosts cluster with NFS storage -> run vm's with XP installed + writing
2. block connectivity to the storage domain from both hosts
3. when vm's pause remove the iptables rule -> select all vm's and run
Actual results:
all but one of the vm's is resumed
when I try to run the vm again it keeps getting EIO errors and pauses.
Expected results:
we should be able to resume all vm's
Additional info: libvirt, vdsm and 2 vm's logs (one is XP-6 that has the issue and one is XP-10 which ran).
[root@gold-vdsd tmp]# vdsClient -s 0 continue 553cd58e-2295-4995-a4ee-71724f63ee49
code = 0
message = Done
[root@gold-vdsd tmp]# vdsClient -s 0 list table
63205116-5547-4e7c-b89f-c6cf8502f09d 23829 XP-10 Up
2c78a0af-9e68-4e3b-a8f7-93346d19c3c9 24042 XP-8 Up
29ce48d2-966c-447b-809e-ae26303be112 25350 XP-5 Up
553cd58e-2295-4995-a4ee-71724f63ee49 23984 XP-6 Paused
50737895-2cee-42aa-8aaf-734e7891a99b 25423 XP-9 Up
985d5a5b-41ed-4b51-8f02-6886a4e3b223 24082 XP-7 Up
68640442-defe-4186-a67a-974fa33dfcf5 23488 XP-3 Up
7c4ee4f9-31bf-4dcd-8ca3-57d3988a1bbf 23787 XP-4 Up
1845bf08-b103-421a-aeb6-127d22486e30 23684 XP-2 Up
fc0643e6-dddc-4662-b3a3-a8b3b27924fd 23189 XP-1 Up
[root@gold-vdsd tmp]# virsh -r list
Id Name State
----------------------------------------------------
71 XP-1 running
72 XP-3 running
73 XP-2 running
74 XP-4 running
75 XP-10 running
76 XP-6 paused
77 XP-8 running
78 XP-7 running
79 XP-5 running
80 XP-9 running
-bash-4.1$ qemu-img info /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
image: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
file format: qcow2
virtual size: 15G (16106127360 bytes)
disk size: 334M
cluster_size: 65536
backing file: ../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929 (actual path: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929)
bash-4.1$ qemu-img check /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0
No errors were found on the image.
(In reply to comment #2)
> FYI:
> Bug 740509 - cannot resume vm's that were paused due to disconnection to SD
> in NFS storage type
Thanks for finding this exact source of the same bug!
Dafna, do you agree to clone this as a duplicate?
Since it's only about windowsXp + IDE + rare case for the storage I rather keep on posting (closing) this case too.
Dor
Created attachment 607072 [details] logs Description of problem: in a two host cluster with NFS storage, I blocked the storage from the host using iptables. after the vm's paused I removed the block -> activated the hosts and once the storage was active I selected all my 10 vms and ran them. one of the vm's refuses to start due to EIO even though the domain is available and all other vm's have started. Iv'e reproduced this several times with the same vm's and each time a different vm pauses and refuses to start. Version-Release number of selected component (if applicable): qemu-img-rhev-0.12.1.2-2.298.el6_3.x86_64 libvirt-0.9.10-21.el6.x86_64 vdsm-4.9.6-30.0.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster with NFS storage -> run vm's with XP installed + writing 2. block connectivity to the storage domain from both hosts 3. when vm's pause remove the iptables rule -> select all vm's and run Actual results: all but one of the vm's is resumed when I try to run the vm again it keeps getting EIO errors and pauses. Expected results: we should be able to resume all vm's Additional info: libvirt, vdsm and 2 vm's logs (one is XP-6 that has the issue and one is XP-10 which ran). [root@gold-vdsd tmp]# vdsClient -s 0 continue 553cd58e-2295-4995-a4ee-71724f63ee49 code = 0 message = Done [root@gold-vdsd tmp]# vdsClient -s 0 list table 63205116-5547-4e7c-b89f-c6cf8502f09d 23829 XP-10 Up 2c78a0af-9e68-4e3b-a8f7-93346d19c3c9 24042 XP-8 Up 29ce48d2-966c-447b-809e-ae26303be112 25350 XP-5 Up 553cd58e-2295-4995-a4ee-71724f63ee49 23984 XP-6 Paused 50737895-2cee-42aa-8aaf-734e7891a99b 25423 XP-9 Up 985d5a5b-41ed-4b51-8f02-6886a4e3b223 24082 XP-7 Up 68640442-defe-4186-a67a-974fa33dfcf5 23488 XP-3 Up 7c4ee4f9-31bf-4dcd-8ca3-57d3988a1bbf 23787 XP-4 Up 1845bf08-b103-421a-aeb6-127d22486e30 23684 XP-2 Up fc0643e6-dddc-4662-b3a3-a8b3b27924fd 23189 XP-1 Up [root@gold-vdsd tmp]# virsh -r list Id Name State ---------------------------------------------------- 71 XP-1 running 72 XP-3 running 73 XP-2 running 74 XP-4 running 75 XP-10 running 76 XP-6 paused 77 XP-8 running 78 XP-7 running 79 XP-5 running 80 XP-9 running -bash-4.1$ qemu-img info /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0 image: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0 file format: qcow2 virtual size: 15G (16106127360 bytes) disk size: 334M cluster_size: 65536 backing file: ../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929 (actual path: /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/../7c68816e-51bf-4c98-bb18-2eb775f763c2/3e03e69e-4e92-4ba3-ace5-2b02bae9e929) bash-4.1$ qemu-img check /rhev/data-center/f2b5703d-6449-461d-a837-2bfd9dcf0201/2045e517-a65b-437d-8b2b-45018a5aaa23/images/7c68816e-51bf-4c98-bb18-2eb775f763c2/33ec3754-1617-454a-8ed6-6fdfdb5967a0 No errors were found on the image.