Red Hat Bugzilla – Bug 1254307
live migration failed when VM deployed from boot from volume on Ceph back end
Last modified: 2016-11-30 15:48:07 EST
Created attachment 1063978 [details]
Description of problem:
live migration failed when VM deployed from image with ephemeral storage on Ceph back end
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Step 1: Created volume with an image
Step 2: Launch instance by choosing an option boot from volume
Step 3: Migrating from compute01 to compute02 but after the live migration,instance still stay on compute01.
2015-08-14 13:39:31.073 44429 ERROR nova.virt.libvirt.driver [-] [instance: c0cb269f-635b-4276-a66e-ab851dc9cf31] Live Migration edit failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02.praveendo720pr.org/system: unable to connect to server at 'compute02.praveendo720pr.org:16509': No route to host
1. Edited the /etc/sysconfig/libvirtd and changed LIBVIRTD_ARGS="--listen"
listen_tls = 0
listen_tcp = 1
3. restarted libvirtd to make sure it is listening on 16509
4. Also had to edit iptables to accept 16509
But different error
2015-08-17 16:10:52.964 6976 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://r14nova3.r14.rcbd.lab/system: authentication failed: No authentication callback available
Basically I guess we need instructions to setup the access over TCP between the compute nodes. Is this supposed to work out of box?
Why is it using qemu+tcp? Shouldn't it be using ssh?
(In reply to Ian Pilcher from comment #7)
> Why is it using qemu+tcp? Shouldn't it be using ssh?
Answering my own question, it looks like this is controlled by the
live_migration_uri setting in /etc/nova/nova.conf. So changing this
setting from qemu+tcp to qemu+ssh and enabling password-less ssh
between the nova user on all of the compute nodes might work.
Just changed to qmenu+ssh. Apparently there is more to this, I might have to setup the password-less
2015-08-18 03:28:24.902 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://r14nova2.r14.rcbd.lab/system: Cannot recv data: Host key verification failed.: Connection reset by peer
2015-08-18 03:28:25.348 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Migration operation has aborted
Did setup password less ssh.
015-08-18 03:49:18.143 65169 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: unable to connect to server at 'r14nova2.r14.rcbd.lab:49152': No route to host
In order to live migrate using ssh you need to change the nova user properties in /etc/passwd from /sbin/nologin to /bin/bash and allow all the nova user in every compute node to connect to the others. There's a bug about it upstream https://bugs.launchpad.net/nova/+bug/1428553
The ssh setting you recommended says is a security issue
We made it work with qmenu+tcp which is the default setting, basically network and firewall configs.
1. Nova configuration on all nova-compute nodes
Live_migration_uri in nova.conf is set to qmenu+tcp which is default in the OSP7 deployment
Verify live_migration_flag is setup properly
Don't have to restart openstack-nova-compute if you don't edit the conf file
2. Libvirt Configuration on all nova compute nodes
Enable libvirt listen flag
listen_tls = 0
listen_tcp = 1
auth_tcp = “none”
systemctl libvirtd restart
3.Firewall configuration on all nova-compute nodes
-A INPUT -p tcp -m multiport --ports 16509 -m comment --comment "libvirt" -j ACCEPT
-A INPUT -p tcp -m multiport --ports 49152:49216 -m comment --comment "migration" -j ACCEPT
systemctl iptables restart
Create a VM ( Boot from volume or boot from image)
nova show VM-iD, note the current host from OS-EXT-SRV-ATTR:host
nova host-list, note down the available hosts and pick a host0name
nova live-migration <VM-ID> <new-host-name>
nova show VM-iD, OS-EXT-SRV-ATTR:host should show the new host
Ian Pilcher and I have used the new enable_live_migration.sh script that was built and included in the .tgz on Jenkins and our initial testing shows that live migrations now function properly. We brought up an instance and through Horizon, we did a Live Migrate and selected an new compute. It successfully showed it being migrated from the original to the new. Running a continuous ping to the floating ip only had one dropped ping during the process.
We also validated that after executing the script, a puppet run did not revert back to any of the original settings prior to the script being run so all looks good.
Rajini or Audra,
is this still an issue in JS-6.0?
Do we close this BZ or retarget it to OSP10 or 11?
We had a script in JS 4.0, but in 5.0 and 6.0 it worked out of the box. We don't use the script.
This bug can be closed