Hide Forgot
---Problem Description--- Migration fails using virt-manager in RHEL6.1 Alpha. While migrating the guest to the destination, the below error is noticed in virt-manager: Unable to migrate guest: Unknown failure : Unable to migrate guest: Unknown failure Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/migrate.py", line 523, in _async_migrate vm.migrate(dstconn, migrate_uri, rate, live, secure, meter=meter) File "/usr/share/virt-manager/virtManager/domain.py", line 1141, in migrate self._backend.migrate(destconn.vmm, flags, newname, interface, rate) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 521, in migrate if ret is None:raise libvirtError('virDomainMigrate() failed', dom=self) libvirtError: Unknown failure ---uname output--- Linux hostname.in.ibm.com 2.6.32-118.el6.x86_64 #1 SMP Tue Feb 22 11:15:55 EST 2011 x86_64 x86_64 x86_64 GNU/Linux Machine Type = x3650 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- 1. Open virt-manager, add the destination host using File---> Add Connection. 2. Mount the storage on the destination. 3. When the guest is in a running state, right-click and choose to Migrate it to the destination. 4. It fails with the error: Unable to migrate guest: Unknown failure =Comment: #1================================================= Created an attachment for dmesg, /var/log/messages and sosreport and virt-manager.log of the destination =Comment: #2================================================= 1.Server architecture(s) (please list all effected) (x86/POWER6/Z/etc.): x86 2.Server type (9117-MMA/HS20/s390/etc.): x3650 3.General component (desktop/kernel/base OS/dev tools/etc.): virt-manager 4.Other components involved (ixgbe/java/emulex/etc.): NA 5.Does the server have the latest GA firmware? yes 6.Has the problem been shown to occur on more than one system? yes 7.Is a tested patch available? If yes to the above, has it been approved upstream? NA 8.What is the latest official Red Hat build on which this bug has been seen? RHEL6.1 Alpha
Created attachment 486439 [details] Created an attachment for dmesg, /var/log/messages and sosreport and virt-manager.log of the source
Created attachment 486440 [details] Created an attachment for dmesg, /var/log/messages and sosreport and virt-manager.log of the destination
Does it happens when you live migrate using qemu-kvm directly?
------- Comment From santwana.samantray.com 2011-03-21 02:16 EDT------- (In reply to comment #8) > Does it happens when you live migrate using qemu-kvm directly? Hello Redhat, When I migrate the guest using qemu-kvm, it happens fine and the migration is successful. Thanks, Santwana
Created attachment 486990 [details] virsh debug files for source and destination ------- Comment on attachment From santwana.samantray.com 2011-03-23 05:31 EDT------- Hello Redhat, I verified this issue in RHEL6.1 Beta(k.v-2.6.32-122.el6), by migrating the guest using virsh. The migration fails and the guest is killed , while the below error is noticed: virsh migrate --live <guest> qemu+ssh://destination/system error: Unknown failure I have created an attachment for the virsh debug files, both for source and destination. Thanks, Santwana
Moving to libvirt Are there any error messages in /var/log/libvirt/qemu/$vmname.log on the source or destination host?
what version of qemu? bug 678524 details a qemu bug with exec: migration that doesn't appear to have been fixed in time for RHEL 6.1 beta that might be responsible for this failure.
------- Comment From santwana.samantray.com 2011-03-24 10:42 EDT------- Hello Redhat, Below error is noticed in /var/log/libvirt/qemu/$vmname.log on the destination host: qemu: re-open of /var/lib/libvirt/images/rhel5.6-x64.img failed wth error -13 reopening of drives failed The guest's disk storage is mounted on the destination host and getsebool virt_use_nfs o/ps is: virt_use_nfs --> on Manually, I am able to start the guest on the destination which is on the mounted NFS path. The version of qemu-kvm installed is: qemu-kvm-0.12.1.2-2.150. Thanks, Santwana
Hi IBM , Would you please provide following information for the reproducer environment ? 1. What is the version of libvirt , virt-manager , qemu-kvm, kernel on both source machine and destination machine . Please check them separately 2. Would you please tell how the guest was created . And please dumpxml the buggy guest xml , which meet this bug during migration ? 3. Would you please show how did you perform the mount command to mount NFS storage on both source machine and destination machine ? 4. Would you please show us the "ll $/mount/point" output ? 5. What is the "getenforce" result on both source machine and destination machine ?
Created attachment 487717 [details] Dumpxml of the guest ------- Comment on attachment From santwana.samantray.com 2011-03-26 02:33 EDT------- Hello Redhat, Please find the details below: 1. What is the version of libvirt , virt-manager , qemu-kvm, kernel on both source machine and destination machine . Please check them separately. Source: libvirt-0.8.7-11.el6.x86_64 libvirt-python-0.8.7-11.el6.x86_64 libvirt-cim-0.5.11-2.el6.x86_64 libvirt-client-0.8.7-11.el6.x86_64 virt-manager-0.8.6-3.el6.noarch qemu-kvm-0.12.1.2-2.150.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.150.el6.x86_64 kernel-2.6.32-122.el6.x86_64 Destination: libvirt-0.8.7-11.el6.x86_64 libvirt-python-0.8.7-11.el6.x86_64 libvirt-client-0.8.7-11.el6.x86_64 libvirt-cim-0.5.11-2.el6.x86_64 virt-manager-0.8.6-3.el6.noarch qemu-kvm-tools-0.12.1.2-2.150.el6.x86_64 qemu-kvm-0.12.1.2-2.150.el6.x86_64 kernel-2.6.32-122.el6.x86_64 2. Would you please tell how the guest was created . And please dumpxml the buggy guest xml , which meet this bug during migration ? The guest was created using virt-manager using ISO as the installation source. I have created an attachment for the dumpxml for the guest. 3. Would you please show how did you perform the mount command to mount NFS storage on both source machine and destination machine ? The storage path of the guest was exported in the NFS server, using the below options: /var/lib/libvirt/images/ *(rw,no_root_squash,sync) exportfs /var/lib/libvirt/images <world> And it was mounted on the destination using: mount -t nfs servername:/var/lib/libvirt/images /var/lib/libvirt/images. I was able to start and stop the guest(storage on the mounted path) on the destination using virsh. 4. Would you please show us the "ll $/mount/point" output ? ll /var/lib/libvirt/images/ total 27028000 -rw-------. 1 qemu qemu 8589934592 Mar 25 16:53 rhel5.6-32.img -rw-r--r--. 1 root root 10737418240 Mar 26 2011 rhel5-64.raw -rw-------. 1 root root 11811160064 Mar 25 20:03 rhel5.6-x64.img where rhel5.6-x64.img is the guest's image which is being migrated. 5. What is the "getenforce" result on both source machine and destination machine ? Source: Permissive Destination: Permissive Thanks, Santwana
Hi IBM , Would you please update your pkgs to the version under http://veillard.com/libvirt/6.1/x86_64/ . And try if you still meet this problem Thanks Vivian Bian
------- Comment From santwana.samantray.com 2011-03-30 06:46 EDT------- Hello Redhat, While installing the kernel package, its giving an error for kernel-firmware dependency. error: Failed dependencies: kernel-firmware >= 2.6.32-125.el6 is needed by kernel-2.6.32-125.el6.x86_64 Please provide the same package for installation. Thanks, Santwana
http://veillard.com/libvirt/6.1/ is a yum repository, just list it in /etc/yum.conf ----------------------- [libvirt-updates] name=libvirt updates baseurl=http://veillard.com/libvirt/6.1 enabled=1 gpgcheck=0 ---------------------- kernel-firmware being platform agnostic it's in the noarch subdirectory Daniel
------- Comment From santwana.samantray.com 2011-04-05 08:30 EDT------- Hello Redhat, I updated the packages at http://veillard.com/libvirt/6.1/, for source and destination before migrating the guest. However, the problem is still reproducible, migration fails with an error[error: Unknown failure] The kernel version is: 2.6.32-125.el6.x86_64 and below packages are installed: libvirt-0.8.7-15.el6.x86_64 libvirt-client-0.8.7-15.el6.x86_64 libvirt-python-0.8.7-15.el6.x86_64 libvirt-devel-0.8.7-15.el6.x86_64 qemu-kvm-0.12.1.2-2.153.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.153.el6.x86_64 Thanks, Santwana
------- Comment From vahegde1.ibm.com 2011-04-19 00:32 EDT------- Hello Daniel/Red Hat , This issue is still reproducible with your custom build. We wanted to understand hen this will get fixed ? Will it be part of RHEL6.1 cycle ? Vasant
> 3. Would you please show how did you perform the mount command to mount NFS > storage on both source machine and destination machine ? > The storage path of the guest was exported in the NFS server, using the below > options: > /var/lib/libvirt/images/ *(rw,no_root_squash,sync) > > exportfs > /var/lib/libvirt/images > <world> > > And it was mounted on the destination using: > mount -t nfs servername:/var/lib/libvirt/images /var/lib/libvirt/images. Is this saying that the *source* virtualization host is acting as the NFS server, or is the NFS server completely separate from both the src + dst virt hosts ? The storage setup on both hosts must be identical. You cannot use a local filesystem on the src host, and a NFS filesystem on the dest - both must be NFS. This is a limitation of the way we currently deal with file ownership & labelling at the end of migration.
(In reply to comment #16) > ------- Comment From santwana.samantray.com 2011-04-05 08:30 EDT------- > Hello Redhat, > > I updated the packages at http://veillard.com/libvirt/6.1/, for source and > destination before migrating the guest. However, the problem is still > reproducible, migration fails with an error[error: Unknown failure] > > The kernel version is: 2.6.32-125.el6.x86_64 and below packages are installed: > libvirt-0.8.7-15.el6.x86_64 > libvirt-client-0.8.7-15.el6.x86_64 > libvirt-python-0.8.7-15.el6.x86_64 > libvirt-devel-0.8.7-15.el6.x86_64 > qemu-kvm-0.12.1.2-2.153.el6.x86_64 That is still insufficient. According to bug 678524, you need at least qemu-kvm-0.12.1.2-2.154.el6 to be sure that you are not hitting the qemu bug. Looking at http://veillard.com/libvirt/6.1/, I see that Daniel has since upgraded to .158 in the meantime; can you please re-pull from that repo and test again?
------- Comment From ryanh.com 2011-04-20 09:34 EDT------- > Is this saying that the *source* virtualization host is acting as the NFS > server, or is the NFS server completely separate from both the src + dst virt > hosts ? > The storage setup on both hosts must be identical. You cannot use a local > filesystem on the src host, and a NFS filesystem on the dest - both must be > NFS. > This is a limitation of the way we currently deal with file ownership & > labelling at the end of migration. Is this a new restriction? We're just doing the simple NFS setup for migration as documented in the Virt Guide http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization-KVM_live_migration-Share_storage_example_NFS_for_a_simple_migration.html This works fine in RHEL6.0
------- Comment From ryanh.com 2011-04-20 09:45 EDT------- additional details we discussed: NFS exports on source: /var/lib/libvirt/images/ *(rw,no_root_squash,sync) Selinux is disabled on both systems. Of note, on destination (nfs client): ls -al /var/lib/libvirt/images/rhel5.6-x64.img -rw-------. 1 nobody nobody 11811160064 Apr 19 16:56 /var/lib/libvirt/images/rhel5.6-x64.img on source, this is owned by: root.root
My understanding is that it has always been the case that source and destination have to use the same filesystem type for sane results, although it may have been a recent development where violating this constraint changed from working by luck to guaranteed failure. Which means that this is really a bug in the Virtualization Guide for recommending a non-working solution.
> Is this a new restriction? No, this restriction was present in RHEL5 & RHEL6. > We're just doing the simple NFS setup for migration > as documented in the Virt Guide > > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization/sect-Virtualization-KVM_live_migration-Share_storage_example_NFS_for_a_simple_migration.html > This works fine in RHEL6.0 I'm surprised if that is the case. Perhaps it was just getting lucky because QEMU didn't try to re-open the disk image when migration completes & so you never noticed that QEMU has lost permission on the disk image.
------- Comment From ryanh.com 2011-04-20 10:42 EDT------- Clearly we do need Virt guide update. As far as the reopen, I'll have to look at qemu source code, but I was pretty sure that we changed QEMU back in RHEL5.4.z or 5.5 to do re-open of image to ensure consistency in the case of NFS migration. I'd be shocked if RHEL6 didn't have this in place, and I know this basic config was functioning in RHEL6. In any case, we're going to switch to having both hosts be nfs clients to see if this resolves our issue or if we still are seeing trouble.
------- Comment From aliguori.com 2011-04-20 11:00 EDT------- (In reply to comment #26) > Clearly we do need Virt guide update. > As far as the reopen, I'll have to look at qemu source code, but I was pretty > sure that we changed QEMU back in RHEL5.4.z or 5.5 to do re-open of image to > ensure consistency in the case of NFS migration. I'd be shocked if RHEL6 > didn't have this in place, and I know this basic config was functioning in > RHEL6. > In any case, we're going to switch to having both hosts be nfs clients to see > if this resolves our issue or if we still are seeing trouble. I still don't understand why this is failing in the first place. Is libvirt detecting whether an image is NFS and behaving differently than if it's a local file system?
> I still don't understand why this is failing in the first place. > > Is libvirt detecting whether an image is NFS and behaving differently than if > it's a local file system? Whenever a VM shuts down, libvirt will reset its disks' ownership and/or labelling. The exception to this is if the guest was migrated off *and* the file is on a shared filesystem. Thus since the src was just using 'ext3' as the filesystem, and not NFS as the dst was, the disks access was revoked when the src QEMU was shutdown. This then prevents the dst QEMU accessing the file.
------- Comment From ryanh.com 2011-04-20 13:08 EDT------- For migration, the src will only be destroyed (not shutdown) if the target is running on the destination. In our case, we're seeing errors on the destination when attempting to open the image, migration never starts. How does this disk image access revocation impact migration ?
------- Comment From santwana.samantray.com 2011-04-21 05:27 EDT------- Hello Redhat, I verified this issue by using a separate NFS Server, apart from the source and destination. The source and the destination are NFS clients. The guest was created on the mounted filesystem on the source. The migration happens fine between the source and the destination, and the issue isn't noticed. However, if the source virtualization host is acting as the NFS server, then the migration is failing. virsh migrate --live <guest> qemu+ssh://dest/system error: unable to set user and group to '107:107' on '/var/lib/libvirt/images/guest.img': Invalid argument. Thanks, Santwana
> virsh migrate --live <guest> qemu+ssh://dest/system > error: unable to set user and group to '107:107' on > '/var/lib/libvirt/images/guest.img': Invalid argument. Oh, did that NFS server have 'root_squash' enabled ? If so, you need to disable it ('no_root_squash' in /etc/exports). If you need to run with a root squashing NFS server, then you would need to edit /etc/libvirt/qemu.conf and set 'dynamic_ownership=0', and then the mgmt app / admin must create all VM disks with ownership qemu:qemu itself, rather than letting libvirt auto set ownership.
also if see the following ownership[1] under mount point . Please do like # mount -o vers=3 $NFS_server:/export/dir /mnt/point [1] ownership ls -al /var/lib/libvirt/images/rhel5.6-x64.img -rw-------. 1 nobody nobody 11811160064 Apr 19 16:56
------- Comment From ryanh.com 2011-04-21 08:22 EDT------- I'm still confused about libvirt migration and disk ownership. At what point does libvirt do all of these modifications? before starting the migration? after but before destroying source? and where? on the source or the destination?
------- Comment From santwana.samantray.com 2011-04-25 09:14 EDT------- Hello Redhat, I verified this issue on RHEL6.1Snap4(k.v-2.6.32-131.0.1.el6.x86_64) having qemu-kvm-0.12.1.2-2.158.el6.x86_64 installed, and still the migration fails using virsh. I used the source virtualization host as the NFS server. The selinux and firewall were disabled on both the source and destination. The nfs server had ("no_root_squash") in the /etc/exportfs file and /etc/libvirt/qemu.conf had dynamic_ownership=0 set. The guest's disk image was mounted using, mount -o vers=3 $NFS_server:/export/dir /mnt/point. Below error is noticed in the /var/log/libvirt/qemu/guest.log after the migration fails. qemu: could not open disk image /var/lib/libvirt/images/rhel5.6-x64.img: Permission denied qemu: re-open of /var/lib/libvirt/images/rhel5.6-x64.img failed wth error -13 reopening of drives failed 2011-04-25 18:33:19.215: shutting down Permission of the NFS mount point containing the guest's disk path on the destination is as below: ls -al /var/lib/libvirt/images/ -rw-------. 1 root root 11811160064 Apr 25 18:29 rhel5.6-x64.img Thanks, Santwana
(In reply to comment #35) > I used the source virtualization host as the NFS server. > The selinux and firewall were disabled on both the source and destination. > The nfs server had ("no_root_squash") in the /etc/exportfs file and > /etc/libvirt/qemu.conf had dynamic_ownership=0 set. Questions : 1. where is the NFS ? a. on the source machine b. on the destination machine c. on the third machine , which is just NFS server 2. where did you set "dynamic_ownership=0" ? a. on the source machine b. on the destination machine c. on the NFS server , the third machine . d. all of the source machine , destination machine and NFS server set it 3. what is the original ownership for that guest image on the NFS server ? a. root:root b. qemu:qemu Note: in /etc/libvirt/qemu.conf , the default ownership for QEMU processes is qemu:qemu . So if you get user = "root" and group = "root" commented ,with the "dynamic_ownership=0" set, you won't be able to start the qemu process. "Permission denied" will prompt up . > The guest's disk image was mounted using, > mount -o vers=3 $NFS_server:/export/dir /mnt/point. > Questions: 1. Where did you mount the NFS server ? a. destination machine only b. both source and destination machines 2. What is the original ownership for guest image on Source and Destination machines after mounting the NFS dir ? a. root:root b. qemu:qemu c. others > Below error is noticed in the /var/log/libvirt/qemu/guest.log after the > migration fails. > qemu: could not open disk image /var/lib/libvirt/images/rhel5.6-x64.img: > Permission denied Questions: 1. Where did you get this result ? a. on source machine when starting the guest b. on destination machine when migrating the guest 2. Have you uncommented the user = "root" and group = "root" in /etc/libvirt/qemu.conf ? If not , this "Permission denied" error is expected .
------- Comment From santwana.samantray.com 2011-04-28 07:52 EDT------- Hello Redhat, In my previous comment, I was using the NFS server as the source instead of a third machine as NFS Server. As mentioned earlier, If I am using a separate NFS Server (apart from source machine) the migration is passing. Please find my response below for the queries. (In reply to comment #36) > (In reply to comment #35) > > I used the source virtualization host as the NFS server. > > The selinux and firewall were disabled on both the source and destination. > > The nfs server had ("no_root_squash") in the /etc/exportfs file and > > /etc/libvirt/qemu.conf had dynamic_ownership=0 set. > Questions : > 1. where is the NFS ? > a. on the source machine > b. on the destination machine > c. on the third machine , which is just NFS server In this case, NFS server is a. on the source machine. > 2. where did you set "dynamic_ownership=0" ? > a. on the source machine > b. on the destination machine > c. on the NFS server , the third machine . > d. all of the source machine , destination machine and NFS server set it I set the dynamic_ownership=0, both on a. source machine and b. destination machine > 3. what is the original ownership for that guest image on the NFS server ? > a. root:root > b. qemu:qemu Original ownership for that guest image on the NFS server is b. qemu:qemu > Note: in /etc/libvirt/qemu.conf , the default ownership for QEMU processes is > qemu:qemu . So if you get user = "root" and group = "root" commented ,with the > "dynamic_ownership=0" set, you won't be able to start the qemu process. > "Permission denied" will prompt up . > > > The guest's disk image was mounted using, > > mount -o vers=3 $NFS_server:/export/dir /mnt/point. > > Questions: > 1. Where did you mount the NFS server ? > a. destination machine only > b. both source and destination machines NFS Server was mounted on destination machine only > 2. What is the original ownership for guest image on Source and Destination > machines after mounting the NFS dir ? > a. root:root > b. qemu:qemu > c. others Original ownership was b.qemu:qemu > > Below error is noticed in the /var/log/libvirt/qemu/guest.log after the > > migration fails. > > qemu: could not open disk image /var/lib/libvirt/images/rhel5.6-x64.img: > > Permission denied > > Questions: > 1. Where did you get this result ? > a. on source machine when starting the guest > b. on destination machine when migrating the guest I got this result on the destination machine after migrating the guest. > 2. Have you uncommented the user = "root" and group = "root" in > /etc/libvirt/qemu.conf ? If not , this "Permission denied" error is expected . Earlier it was commented by default, after uncommenting it and migrating, the error isn't noticed anymore and migration passes. Thanks, Santwana
------- Comment From santwana.samantray.com 2011-05-12 02:31 EDT------- Hello Redhat, I verified this issue in RHEL6.1 RC2(k.v-2.6.32-131.0.13.el6) using 2 systems, where the NFS Server was the source host itself, and another system for destination. Initially, the migration was still failing with the error, virsh migrate --live <guest> qemu+ssh://dest/system " error: unable to set user and group to '107:107' on '/var/lib/libvirt/images/guest.img' ". After setting the dynamic_ownership=0 on both source and destination, and uncommenting the user = "root" and group = "root" in /etc/libvirt/qemu.conf, the migration passes. Since, migration is happening fine after making the above changes, we can close this bug. Please share your thoughts. Thanks. Santwana
------- Comment From onmahaja.com 2011-05-20 08:46 EDT------- Hello Redhat, I have tried this with running SLES11 SP1 guest . And it works absolutely fine for me too. Thanks, Onkar Mahajan
Closing per comment 38.