Bug 765584 (GLUSTER-3852) - KVM migration works once, fails second time
Summary: KVM migration works once, fails second time
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-3852
Product: GlusterFS
Classification: Community
Component: fuse
Version: 3.3-beta
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 848343
TreeView+ depends on / blocked
 
Reported: 2011-12-07 16:54 UTC by Stephan Ellis
Modified: 2013-12-09 01:28 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0qa6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 848343 (view as bug list)
Environment:
Last Closed: 2012-12-21 05:50:49 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
/var/log/glusterfs directory from hyper1 (30.74 KB, application/x-gzip)
2011-12-07 14:20 UTC, Stephan Ellis
no flags Details
/var/log/glusterfs directory from hyper2 (25.83 KB, application/x-gzip)
2011-12-07 14:20 UTC, Stephan Ellis
no flags Details
Second migration failing using native fuse client (strace) (228.07 KB, text/plain)
2011-12-08 12:24 UTC, Stephan Ellis
no flags Details

Description Stephan Ellis 2011-12-07 13:56:10 UTC
Forgot to mention:

Running CentOS 6 with all current updates, running gluster 3.3 beta 2.  Replicate volume bricks are on hyper1 and hyper2 (KVM hosts).  Migration works fine on gfs2 volume on the same hosts.

Comment 1 Stephan Ellis 2011-12-07 14:20:10 UTC
Created attachment 724

Comment 2 Stephan Ellis 2011-12-07 14:20:39 UTC
Created attachment 725

Comment 3 Stephan Ellis 2011-12-07 14:21:30 UTC
Using the same volume mounted as nfs, migration works flawlessly.

Comment 4 Stephan Ellis 2011-12-07 16:54:43 UTC
I have to libvirt based KVM hosts.  I've set up a gluster replicate volume between them, XFS as the actual filesystem for the bricks.  Gluster volume is mounted on both hosts using the native fuse client.  When migrating a VM from hyper1 to hyper2, it works.  Migration from hyper2 to hyper1 fails.  Same results if the first migration is hyper2 -> hyper1, second migration hyper1 -> hyper2 fails.

Relevant log lines from hyper1 when migrating a second time:

Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.063: info : qemudDispatchServer:1398 : Turn off polkit auth for privileged client 18310
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.094: info : qemuSecurityDACSetOwnership:40 : Setting DAC user and group on '/mnt/gstor/ar-lab.img' to '107:107'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.114: info : qemudDispatchSignalEvent:397 : Received unexpected signal 17
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.220: info : qemudDispatchSignalEvent:397 : Received unexpected signal 17
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.225: info : brProbeVnetHdr:449 : Enabling IFF_VNET_HDR
Dec  7 10:53:00 hyper1 kernel: device vnet1 entered promiscuous mode
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.229: info : brProbeVnetHdr:449 : Enabling IFF_VNET_HDR
Dec  7 10:53:00 hyper1 kernel: vsCore: port 2(vnet1) entering forwarding state
Dec  7 10:53:00 hyper1 kernel: device vnet2 entered promiscuous mode
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.233: info : brProbeVnetHdr:449 : Enabling IFF_VNET_HDR
Dec  7 10:53:00 hyper1 kernel: vsPrivate: port 4(vnet2) entering forwarding state
Dec  7 10:53:00 hyper1 kernel: device vnet5 entered promiscuous mode
Dec  7 10:53:00 hyper1 kernel: vsCluster: port 2(vnet5) entering forwarding state
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.254: info : qemudDispatchSignalEvent:397 : Received unexpected signal 17
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.310: info : udevGetDeviceProperty:116 : udev reports device 'vnet2' does not have property 'DRIVER'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.310: info : udevGetDeviceProperty:116 : udev reports device 'vnet2' does not have property 'PCI_CLASS'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.317: info : udevGetDeviceProperty:116 : udev reports device 'vnet1' does not have property 'DRIVER'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.317: info : udevGetDeviceProperty:116 : udev reports device 'vnet1' does not have property 'PCI_CLASS'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.322: info : udevGetDeviceProperty:116 : udev reports device 'vnet5' does not have property 'DRIVER'
Dec  7 10:53:00 hyper1 libvirtd: 10:53:00.322: info : udevGetDeviceProperty:116 : udev reports device 'vnet5' does not have property 'PCI_CLASS'
Dec  7 10:53:02 hyper1 libvirtd: 10:53:02.091: info : qemuSecurityDACRestoreSecurityFileLabel:80 : Restoring DAC user and group on '/mnt/gstor/ar-lab.img'
Dec  7 10:53:02 hyper1 libvirtd: 10:53:02.091: info : qemuSecurityDACSetOwnership:40 : Setting DAC user and group on '/mnt/gstor/ar-lab.img' to '0:0'
Dec  7 10:53:02 hyper1 kernel: vsCore: port 2(vnet1) entering disabled state
Dec  7 10:53:02 hyper1 kernel: device vnet1 left promiscuous mode
Dec  7 10:53:02 hyper1 kernel: vsCore: port 2(vnet1) entering disabled state
Dec  7 10:53:02 hyper1 kernel: vsPrivate: port 4(vnet2) entering disabled state
Dec  7 10:53:02 hyper1 kernel: device vnet2 left promiscuous mode
Dec  7 10:53:02 hyper1 kernel: vsPrivate: port 4(vnet2) entering disabled state
Dec  7 10:53:02 hyper1 kernel: vsCluster: port 2(vnet5) entering disabled state
Dec  7 10:53:02 hyper1 kernel: device vnet5 left promiscuous mode
Dec  7 10:53:02 hyper1 kernel: vsCluster: port 2(vnet5) entering disabled state
Dec  7 10:53:02 hyper1 libvirtd: 10:53:02.346: error : qemudDomainMigrateFinish2:11763 : internal error guest unexpectedly quit

Comment 5 Amar Tumballi 2011-12-08 00:28:31 UTC
Suspect it to be an issue with O_DIRECT in open(). can we get 'strace -f -v' output?

Comment 6 Stephan Ellis 2011-12-08 12:24:06 UTC
Created attachment 727


command was:

strace -f -v -o migtrace.txt virsh migrate --live ar-lab qemu+ssh://hyper1/system

Comment 7 Amar Tumballi 2012-12-21 05:50:49 UTC
Hi Stephan,

There has been some work done for hosting of VM images on top of GlusterFS now... Can you please confirm if its fixed in latest qa releases? (as of today its glusterfs-3.4.0qa6) For us this is working fine.


Note You need to log in before you can comment on or make changes to this bug.