| Summary: | QEMU crashes after resume (cont) with gluster backed volumes | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Shanzhi Yu <shyu> | ||||||||||
| Component: | qemu-kvm | Assignee: | Jeff Cody <jcody> | ||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 7.0 | CC: | dyuan, hhuang, juzhang, mazhang, mzhan, pkrempa, rbalakri, shyu, virt-bugs, virt-maint, xuhan | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2015-03-03 20:20:57 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Attachments: |
|
||||||||||||
Does the same happen if you don't use RDMA transport? Do you have your InfiniBand connection properly configured? Please provide debug logs of the libvirt daemon AND the vm log of the VM that crashed/disappeared. ( /var/log/libvirt/qemu/rhel6.log ) (In reply to Peter Krempa from comment #2) > Does the same happen if you don't use RDMA transport? > yes > Do you have your InfiniBand connection properly configured? > what does this mean? I can use glusterfs well. > Please provide debug logs of the libvirt daemon AND the vm log of the VM > that crashed/disappeared. ( /var/log/libvirt/qemu/rhel6.log ) libvirtd log and vm log is as the attachment. Created attachment 826007 [details]
libvirtd log
Created attachment 826008 [details]
vm log
Created attachment 826105 [details]
libvirtd.log
Created attachment 826107 [details]
guest log
According to the libvirtd log qemu crashes: 2013-11-19 15:02:46.129+0000: 19917: debug : qemuMonitorIO:708 : Error on monitor Unable to read from monitor: Connection reset by peer 2013-11-19 15:02:46.129+0000: 19917: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=14 events=12 2013-11-19 15:02:46.129+0000: 19917: debug : virEventPollInterruptLocked:710 : Skip interrupt, 1 139653503248512 2013-11-19 15:02:46.129+0000: 19917: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f037c007120 2013-11-19 15:02:46.129+0000: 19917: debug : qemuMonitorIO:731 : Triggering EOF callback 2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7f037400e870 'rhel6.4' 2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessHandleMonitorEOF:311 : Monitor connection to 'rhel6.4' closed without SHUTDOWN event; assuming the domain crashed 2013-11-19 15:02:46.137+0000: 19917: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f03801509d0 2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessStop:4140 : Shutting down VM 'rhel6.4' pid=20171 flags=0 I'm going to re-assign this to the qemu component for further investigation. I wasn't able to reproduce the issue in my environment thus I can't provide any additional information. Please attach a stack trace of the crashed qemu to aid the qemu developers finding the issue. *** Bug 1030749 has been marked as a duplicate of this bug. *** Looks like a dupe, or at least related, to bug 1031877 I have been unable to reproduce this bug in my environment as well, both on a RHEL7 guest and my normal F19 dev machine running RHEL7 qemu binaries. I've tried both the glusterfs lib version for RHEL7, as well as latest from git, and I still cannot reproduce. Both Peter and Jeff failed to reproduce it... Can you test once more and give us more details about your environment? (In reply to Jeff Cody from comment #13) > I have been unable to reproduce this bug in my environment as well, both on > a RHEL7 guest and my normal F19 dev machine running RHEL7 qemu binaries. > I've tried both the glusterfs lib version for RHEL7, as well as latest from > git, and I still cannot reproduce. Hi Jeff, I can reproduce it with latest qemu-kvm-rhev & libvirt Please note that I do test on guest without an healthy os, I can't reproduce it with an healthy guest. # rpm -q libvirt qemu-kvm-rhev glusterfs libvirt-1.1.1-22.el7.x86_64 qemu-kvm-rhev-1.5.3-45.el7.x86_64 glusterfs-3.4.0.59rhs-1.el7.x86_64 1. prepare guest with glusterfs volume as source disk # virsh dumpxml rhel6|grep disk -A 4 <disk type='network' device='disk'> <driver name='qemu' type='qcow2'/> <source protocol='gluster' name='gluster-vol1/test.img'> <host name='10.66.5.78' port='24007'/> </source> -- </disk> <controller type='scsi' index='0' model='virtio-scsi'> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </controller> # qemu-img info gluster://10.66.5.78/gluster-vol1/test.img image: gluster://10.66.5.78/gluster-vol1/test.img file format: qcow2 virtual size: 100G (107374182400 bytes) disk size: 194K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false 2. start guest and suspend/resume it # virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off # virsh start rhel6 Domain rhel6 started # virsh list Id Name State ---------------------------------------------------- 55 rhel6 running # virsh suspend rhel6 Domain rhel6 suspended # virsh list --all Id Name State ---------------------------------------------------- 55 rhel6 paused # virsh list --all Id Name State ---------------------------------------------------- - rhel6 shut off 3. error info # grep error /tmp/libvirtd.log 2014-02-10 11:23:38.288+0000: 26424: error : qemuMonitorIORead:552 : Unable to read from monitor: Connection reset by peer *** Bug 1031877 has been marked as a duplicate of this bug. *** Shanzhi,
> Please note that I do test on guest without an healthy os,
> I can't reproduce it with an healthy guest.
What do you mean by "healthy" os?
(In reply to Jeff Cody from comment #18) > Shanzhi, > > > Please note that I do test on guest without an healthy os, > > I can't reproduce it with an healthy guest. > > What do you mean by "healthy" os? Install OS(RHEL6.X) on guest and make sure guest is running status (In reply to Shanzhi Yu from comment #19) > (In reply to Jeff Cody from comment #18) > > Shanzhi, > > > > > Please note that I do test on guest without an healthy os, > > > I can't reproduce it with an healthy guest. > > > > What do you mean by "healthy" os? > > Install OS(RHEL6.X) on guest and make sure guest is running status I'm still confused by this differentiation - are you able to reproduce this BZ still? Can you give me more information by what you mean by healthy vs unhealthy guest? Thanks! (In reply to Jeff Cody from comment #20) > (In reply to Shanzhi Yu from comment #19) > > (In reply to Jeff Cody from comment #18) > > > Shanzhi, > > > > > > > Please note that I do test on guest without an healthy os, > > > > I can't reproduce it with an healthy guest. > > > > > > What do you mean by "healthy" os? > > > > Install OS(RHEL6.X) on guest and make sure guest is running status > > I'm still confused by this differentiation - are you able to reproduce this > BZ still? Can you give me more information by what you mean by healthy vs > unhealthy guest? Thanks! Hi Jeff, Previous, I reproduce it when I try to suspend/resume a guest without OS installed(just define/start a guest with a clean source file). Current, I fail to reproduce it. I use the latest libvirt/qemu version on rhel7 # rpm -q libvirt qemu-kvm-rhev glusterfs libvirt-1.2.8-6.el7.x86_64 qemu-kvm-rhev-2.1.2-7.el7.x86_64 glusterfs-3.6.0.29-2.el7.x86_64 Closing this, as we are unable to reproduce. |
Description of problem: Fail to resume an guest which use glusterfs volume Version-Release number of selected component (if applicable): qemu-kvm-rhev-1.5.3-19.el7.x86_64 libvirt-1.1.1-12.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. create an guest using glusterfs volume as source disk # virsh dumpxml rhel6 .. <disk type='network' device='disk'> <driver name='qemu' type='qcow2'/> <source protocol='gluster' name='gluster-vol1/rhel6-qcow2.img'> <host name='10.66.106.22' port='24007' transport='rdma'/> </source> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </controller> .. 2. suspend the guest # virsh start rhel6 ;virsh suspend rhel6;virsh list --all Domain rhel6 started Domain rhel6 suspended Id Name State ---------------------------------------------------- 72 rhel6 paused 3. resume the paused guest and check guest status # virsh resume rhel6;virsh list --all Domain rhel6 resumed Id Name State ---------------------------------------------------- rhel6 shut off Notices: 1. If i modify "<target dev='vda' bus='virtio'/>" to "<target dev='sda' bus='scsi'/>" or delete lines " <controller type='scsi' index='0' model='virtio-scsi'> .. </controller>" in guest xml, then I tetest it, there is no problems, guest succeed resuming. 2. If I use file type disk as source, it work well with all situations above. Actual results: as above Expected results: guest should can be resume in step 3. Additional info: