Created attachment 1063978 [details] LiveMigdefect Description of problem: live migration failed when VM deployed from image with ephemeral storage on Ceph back end Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: Step 1: Created volume with an image Step 2: Launch instance by choosing an option boot from volume Step 3: Migrating from compute01 to compute02 but after the live migration,instance still stay on compute01. Actual results: 2015-08-14 13:39:31.073 44429 ERROR nova.virt.libvirt.driver [-] [instance: c0cb269f-635b-4276-a66e-ab851dc9cf31] Live Migration edit failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02.praveendo720pr.org/system: unable to connect to server at 'compute02.praveendo720pr.org:16509': No route to host Expected results: Successfull migration Additional info: 1. Edited the /etc/sysconfig/libvirtd and changed LIBVIRTD_ARGS="--listen" 2. /etc/libvirt/libvirtd.conf listen_tls = 0 listen_tcp = 1 3. restarted libvirtd to make sure it is listening on 16509 4. Also had to edit iptables to accept 16509 But different error 2015-08-17 16:10:52.964 6976 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://r14nova3.r14.rcbd.lab/system: authentication failed: No authentication callback available
Basically I guess we need instructions to setup the access over TCP between the compute nodes. Is this supposed to work out of box?
Why is it using qemu+tcp? Shouldn't it be using ssh?
(In reply to Ian Pilcher from comment #7) > Why is it using qemu+tcp? Shouldn't it be using ssh? Answering my own question, it looks like this is controlled by the live_migration_uri setting in /etc/nova/nova.conf. So changing this setting from qemu+tcp to qemu+ssh and enabling password-less ssh between the nova user on all of the compute nodes might work.
Just changed to qmenu+ssh. Apparently there is more to this, I might have to setup the password-less 2015-08-18 03:28:24.902 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://r14nova2.r14.rcbd.lab/system: Cannot recv data: Host key verification failed.: Connection reset by peer 2015-08-18 03:28:25.348 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Migration operation has aborted
Did setup password less ssh. 015-08-18 03:49:18.143 65169 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: unable to connect to server at 'r14nova2.r14.rcbd.lab:49152': No route to host
In order to live migrate using ssh you need to change the nova user properties in /etc/passwd from /sbin/nologin to /bin/bash and allow all the nova user in every compute node to connect to the others. There's a bug about it upstream https://bugs.launchpad.net/nova/+bug/1428553
The ssh setting you recommended says is a security issue We made it work with qmenu+tcp which is the default setting, basically network and firewall configs. 1. Nova configuration on all nova-compute nodes Live_migration_uri in nova.conf is set to qmenu+tcp which is default in the OSP7 deployment Verify live_migration_flag is setup properly Don't have to restart openstack-nova-compute if you don't edit the conf file 2. Libvirt Configuration on all nova compute nodes Enable libvirt listen flag vi /etc/sysconfig/libvirtd LIBVIRTD_ARGS=”–listen” Enable /etc/libvirt/libvirtd.conf listen_tls = 0 listen_tcp = 1 auth_tcp = “none” Restart libvirt systemctl libvirtd restart 3.Firewall configuration on all nova-compute nodes vi /etc/sysconfig/iptables -A INPUT -p tcp -m multiport --ports 16509 -m comment --comment "libvirt" -j ACCEPT -A INPUT -p tcp -m multiport --ports 49152:49216 -m comment --comment "migration" -j ACCEPT Restart libvirt systemctl iptables restart 4.Test Create a VM ( Boot from volume or boot from image) nova show VM-iD, note the current host from OS-EXT-SRV-ATTR:host nova host-list, note down the available hosts and pick a host0name nova live-migration <VM-ID> <new-host-name> nova show VM-iD, OS-EXT-SRV-ATTR:host should show the new host
Ian Pilcher and I have used the new enable_live_migration.sh script that was built and included in the .tgz on Jenkins and our initial testing shows that live migrations now function properly. We brought up an instance and through Horizon, we did a Live Migrate and selected an new compute. It successfully showed it being migrated from the original to the new. Running a continuous ping to the floating ip only had one dropped ping during the process. We also validated that after executing the script, a puppet run did not revert back to any of the original settings prior to the script being run so all looks good.
Rajini or Audra, is this still an issue in JS-6.0? Do we close this BZ or retarget it to OSP10 or 11?
We had a script in JS 4.0, but in 5.0 and 6.0 it worked out of the box. We don't use the script.
This bug can be closed