Bug 1031943 - QEMU crashes after resume (cont) with gluster backed volumes
Summary: QEMU crashes after resume (cont) with gluster backed volumes
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Jeff Cody
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1030749 1031877 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-19 08:09 UTC by Shanzhi Yu
Modified: 2015-03-03 20:20 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-03 20:20:57 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirtd log (1.38 MB, text/plain)
2013-11-19 10:40 UTC, Shanzhi Yu
no flags Details
vm log (15.79 KB, text/plain)
2013-11-19 10:41 UTC, Shanzhi Yu
no flags Details
libvirtd.log (2.52 MB, text/x-log)
2013-11-19 15:16 UTC, Shanzhi Yu
no flags Details
guest log (4.54 KB, text/x-log)
2013-11-19 15:17 UTC, Shanzhi Yu
no flags Details

Description Shanzhi Yu 2013-11-19 08:09:27 UTC
Description of problem:

Fail to resume an guest which use glusterfs volume

Version-Release number of selected component (if applicable):

qemu-kvm-rhev-1.5.3-19.el7.x86_64
libvirt-1.1.1-12.el7.x86_64

How reproducible:

100%

Steps to Reproduce:

1. create an guest using glusterfs volume as source disk
# virsh dumpxml rhel6
..
<disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='gluster' name='gluster-vol1/rhel6-qcow2.img'>
        <host name='10.66.106.22' port='24007' transport='rdma'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </controller>

..

2. suspend the guest

# virsh start rhel6 ;virsh suspend rhel6;virsh list --all
Domain rhel6 started

Domain rhel6 suspended

 Id    Name                           State
----------------------------------------------------
 72    rhel6                          paused


3. resume the paused guest and check guest status

# virsh resume rhel6;virsh list --all

Domain rhel6 resumed

 Id    Name                           State
----------------------------------------------------
      rhel6                          shut off

Notices:

1. If i modify "<target dev='vda' bus='virtio'/>" to 
"<target dev='sda' bus='scsi'/>" or 
delete lines " <controller type='scsi' index='0' model='virtio-scsi'>
..
</controller>" in guest xml,

then I tetest it, there is no problems, guest succeed resuming.

2. If I use file type disk as source, it work well with all situations above.
 



Actual results:

as above

Expected results:

guest should can be resume in step 3.

Additional info:

Comment 2 Peter Krempa 2013-11-19 09:54:37 UTC
Does the same happen if you don't use RDMA transport?

Do you have your InfiniBand connection properly configured?

Please provide debug logs of the libvirt daemon AND the vm log of the VM that crashed/disappeared. ( /var/log/libvirt/qemu/rhel6.log )

Comment 3 Shanzhi Yu 2013-11-19 10:38:36 UTC
(In reply to Peter Krempa from comment #2)
> Does the same happen if you don't use RDMA transport?
> 

yes

> Do you have your InfiniBand connection properly configured?
> 

what does this mean? I can use glusterfs well.

> Please provide debug logs of the libvirt daemon AND the vm log of the VM
> that crashed/disappeared. ( /var/log/libvirt/qemu/rhel6.log )

libvirtd log and vm log is as the attachment.

Comment 4 Shanzhi Yu 2013-11-19 10:40:06 UTC
Created attachment 826007 [details]
libvirtd log

Comment 5 Shanzhi Yu 2013-11-19 10:41:13 UTC
Created attachment 826008 [details]
vm log

Comment 8 Shanzhi Yu 2013-11-19 15:16:21 UTC
Created attachment 826105 [details]
libvirtd.log

Comment 9 Shanzhi Yu 2013-11-19 15:17:12 UTC
Created attachment 826107 [details]
guest log

Comment 10 Peter Krempa 2013-11-22 14:05:30 UTC
According to the libvirtd log qemu crashes:

2013-11-19 15:02:46.129+0000: 19917: debug : qemuMonitorIO:708 : Error on monitor Unable to read from monitor: Connection reset by peer
2013-11-19 15:02:46.129+0000: 19917: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=14 events=12
2013-11-19 15:02:46.129+0000: 19917: debug : virEventPollInterruptLocked:710 : Skip interrupt, 1 139653503248512
2013-11-19 15:02:46.129+0000: 19917: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f037c007120
2013-11-19 15:02:46.129+0000: 19917: debug : qemuMonitorIO:731 : Triggering EOF callback
2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessHandleMonitorEOF:293 : Received EOF on 0x7f037400e870 'rhel6.4'
2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessHandleMonitorEOF:311 : Monitor connection to 'rhel6.4' closed without SHUTDOWN event; assuming the domain crashed
2013-11-19 15:02:46.137+0000: 19917: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f03801509d0
2013-11-19 15:02:46.137+0000: 19917: debug : qemuProcessStop:4140 : Shutting down VM 'rhel6.4' pid=20171 flags=0

I'm going to re-assign this to the qemu component for further investigation.

I wasn't able to reproduce the issue in my environment thus I can't provide any additional information. Please attach a stack trace of the crashed qemu to aid the qemu developers finding the issue.

Comment 11 Peter Krempa 2013-11-22 14:18:08 UTC
*** Bug 1030749 has been marked as a duplicate of this bug. ***

Comment 12 Ademar Reis 2013-12-10 19:47:48 UTC
Looks like a dupe, or at least related, to bug 1031877

Comment 13 Jeff Cody 2014-01-28 18:35:52 UTC
I have been unable to reproduce this bug in my environment as well, both on a RHEL7 guest and my normal F19 dev machine running RHEL7 qemu binaries.  I've tried both the glusterfs lib version for RHEL7, as well as latest from git, and I still cannot reproduce.

Comment 14 Ademar Reis 2014-01-28 19:11:07 UTC
Both Peter and Jeff failed to reproduce it... Can you test once more and give us more details about your environment?

Comment 15 Shanzhi Yu 2014-02-10 11:38:13 UTC
(In reply to Jeff Cody from comment #13)
> I have been unable to reproduce this bug in my environment as well, both on
> a RHEL7 guest and my normal F19 dev machine running RHEL7 qemu binaries. 
> I've tried both the glusterfs lib version for RHEL7, as well as latest from
> git, and I still cannot reproduce.

Hi Jeff,
I can reproduce it with latest qemu-kvm-rhev & libvirt
Please note that I do test on guest without an healthy os,
I can't reproduce it with an healthy guest.

# rpm -q libvirt qemu-kvm-rhev glusterfs
libvirt-1.1.1-22.el7.x86_64
qemu-kvm-rhev-1.5.3-45.el7.x86_64
glusterfs-3.4.0.59rhs-1.el7.x86_64

1. prepare guest with glusterfs volume as source disk

# virsh dumpxml rhel6|grep disk -A 4
    <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='gluster' name='gluster-vol1/test.img'>
        <host name='10.66.5.78' port='24007'/>
      </source>
--
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </controller>
# qemu-img info gluster://10.66.5.78/gluster-vol1/test.img
image: gluster://10.66.5.78/gluster-vol1/test.img
file format: qcow2
virtual size: 100G (107374182400 bytes)
disk size: 194K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false

2. start guest and suspend/resume it
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel6                          shut off
# virsh start rhel6
Domain rhel6 started

# virsh list 
 Id    Name                           State
----------------------------------------------------
 55    rhel6                          running
# virsh suspend rhel6
Domain rhel6 suspended

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 55    rhel6                          paused
# virsh list  --all
 Id    Name                           State
----------------------------------------------------
 -     rhel6                          shut off
3. error info
# grep error /tmp/libvirtd.log 
2014-02-10 11:23:38.288+0000: 26424: error : qemuMonitorIORead:552 : Unable to read from monitor: Connection reset by peer

Comment 16 Ademar Reis 2014-04-18 15:05:19 UTC
*** Bug 1031877 has been marked as a duplicate of this bug. ***

Comment 18 Jeff Cody 2014-07-22 20:29:57 UTC
Shanzhi,

> Please note that I do test on guest without an healthy os,
> I can't reproduce it with an healthy guest.

What do you mean by "healthy" os?

Comment 19 Shanzhi Yu 2014-07-23 02:36:46 UTC
(In reply to Jeff Cody from comment #18)
> Shanzhi,
> 
> > Please note that I do test on guest without an healthy os,
> > I can't reproduce it with an healthy guest.
> 
> What do you mean by "healthy" os?

Install OS(RHEL6.X) on guest and make sure guest is running status

Comment 20 Jeff Cody 2014-11-05 20:13:28 UTC
(In reply to Shanzhi Yu from comment #19)
> (In reply to Jeff Cody from comment #18)
> > Shanzhi,
> > 
> > > Please note that I do test on guest without an healthy os,
> > > I can't reproduce it with an healthy guest.
> > 
> > What do you mean by "healthy" os?
> 
> Install OS(RHEL6.X) on guest and make sure guest is running status

I'm still confused by this differentiation - are you able to reproduce this BZ still?  Can you give me more information by what you mean by healthy vs unhealthy guest?  Thanks!

Comment 21 Shanzhi Yu 2014-11-11 05:24:00 UTC
(In reply to Jeff Cody from comment #20)
> (In reply to Shanzhi Yu from comment #19)
> > (In reply to Jeff Cody from comment #18)
> > > Shanzhi,
> > > 
> > > > Please note that I do test on guest without an healthy os,
> > > > I can't reproduce it with an healthy guest.
> > > 
> > > What do you mean by "healthy" os?
> > 
> > Install OS(RHEL6.X) on guest and make sure guest is running status
> 
> I'm still confused by this differentiation - are you able to reproduce this
> BZ still?  Can you give me more information by what you mean by healthy vs
> unhealthy guest?  Thanks!

Hi Jeff,

Previous, I reproduce it when I try to suspend/resume a guest without OS installed(just define/start a guest with a clean source file).

Current, I fail to reproduce it. 
I use the latest libvirt/qemu version on rhel7

# rpm -q libvirt qemu-kvm-rhev glusterfs
libvirt-1.2.8-6.el7.x86_64
qemu-kvm-rhev-2.1.2-7.el7.x86_64
glusterfs-3.6.0.29-2.el7.x86_64

Comment 22 Jeff Cody 2015-03-03 20:20:57 UTC
Closing this, as we are unable to reproduce.


Note You need to log in before you can comment on or make changes to this bug.