Description of the problem -------------------------- Distributed-replicate gluster volume was used as Virtual Machine imagestore in RHEV. There were 2 app vms that were created with their disk image on gluster volume. After few rebalance operations caused by add-brick and remove-brick, the app vms went in to paused state Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.7.1-4.el6rhs How Reproducible: ----------------- Always Actual Result: -------------- After few rebalance operation, app VMs went in to paused state Expected Result: ---------------- After rebalance, app VMs should be in running state. There shouldn't be any errors seen in the normal operation of app VMs
Missed the step to reproduce in comment0 Steps to reproduce: ------------------- 0. Add RHGS nodes to gluster enabled cluster in RHEVM 1. Create a distributed-replicate volume of 2X2 and start it 2. Optimize the volume for virt-store and set ownership of RHEV on that volume 3. Use this volume as a Data Domain that uses glusterfs as a type ( this makes the gluster volume as VM Image store in RHEV ) 4. Create few appvms in RHEV that has its root disk image on gluster volume 5. Install OS in the appvms 6. Add more bricks to the gluster volume and perform rebalance 7. After every rebalance operation, check for state of App VMs
Observed few errors from QEMU logs : QEMU logs corresponds to VM errors and could be located under '/var/log/libvirt/qemu'. Following is the error message from appvm1, /var/log/libvirt/qemu/appvm1.log <snip> block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) </snip>
I tried to reproduce this issue with QEMU/KVM + RHGS integration environment and I am not seeing this issue. I will retry the issue with RHEV + RHGS integration. I will remove blocker flag as this issue is not yet frequently reproducible.
Removing keyword REGRESSION, as this is issue not always hit as mentioned in comment5
I couldn't reproduce this issue again after few attempts. Anyhow, at the end of regression testing, I will CLOSE this bug, if its no longer reproducible. Thanks
This issue is because of issue found in https://bugzilla.redhat.com/show_bug.cgi?id=1243542 Closing this bug as a DUPLICATE of https://bugzilla.redhat.com/show_bug.cgi?id=1243542 *** This bug has been marked as a duplicate of bug 1243542 ***