Bug 1254307

Summary: live migration failed when VM deployed from boot from volume on Ceph back end
Product: Red Hat OpenStack Reporter: Rajini Karthik <rajini.karthik>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: arkady_kanevsky, audra_cooper, berrange, bkopilov, cdevine, christopher_dearborn, dasmith, eglynn, ipilcher, John_walsh, kbader, kchamart, kschinck, kurt_hey, mburns, morazi, rajini.karthik, randy_perryman, rsussman, sbauza, sferdjao, sgordon, smerrow, sreichar, srevivo, vromanso, wayne_allen, yrabl
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-30 20:48:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1172300    
Description Flags
LiveMigdefect none

Description Rajini Karthik 2015-08-17 16:35:24 UTC
Created attachment 1063978 [details]

Description of problem:
live migration failed when VM deployed from image with ephemeral storage on Ceph back end 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Step 1: Created volume with an image
Step 2: Launch instance by choosing an option boot from volume
Step 3: Migrating from compute01 to compute02 but after the live migration,instance still stay on compute01.

Actual results:
2015-08-14 13:39:31.073 44429 ERROR nova.virt.libvirt.driver [-] [instance: c0cb269f-635b-4276-a66e-ab851dc9cf31] Live Migration edit failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02.praveendo720pr.org/system: unable to connect to server at 'compute02.praveendo720pr.org:16509': No route to host

Expected results:
Successfull migration

Additional info:

1. Edited the    /etc/sysconfig/libvirtd and changed LIBVIRTD_ARGS="--listen"
2. /etc/libvirt/libvirtd.conf
listen_tls = 0
listen_tcp = 1
3. restarted libvirtd to make sure it is listening on 16509
4. Also had to edit iptables to accept 16509

But different error
2015-08-17 16:10:52.964 6976 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://r14nova3.r14.rcbd.lab/system: authentication failed: No authentication callback available

Comment 5 Rajini Karthik 2015-08-17 18:00:03 UTC
Basically I guess we need instructions to setup the access over TCP between the compute nodes. Is this supposed to work out of box?

Comment 7 Ian Pilcher 2015-08-17 18:04:41 UTC
Why is it using qemu+tcp?  Shouldn't it be using ssh?

Comment 8 Ian Pilcher 2015-08-17 18:15:44 UTC
(In reply to Ian Pilcher from comment #7)
> Why is it using qemu+tcp?  Shouldn't it be using ssh?

Answering my own question, it looks like this is controlled by the
live_migration_uri setting in /etc/nova/nova.conf.  So changing this
setting from qemu+tcp to qemu+ssh and enabling password-less ssh
between the nova user on all of the compute nodes might work.

Comment 9 Rajini Karthik 2015-08-18 03:30:44 UTC
Just changed to qmenu+ssh. Apparently there is more to this, I might have to setup the password-less 

2015-08-18 03:28:24.902 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://r14nova2.r14.rcbd.lab/system: Cannot recv data: Host key verification failed.: Connection reset by peer
2015-08-18 03:28:25.348 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Migration operation has aborted

Comment 10 Rajini Karthik 2015-08-18 03:52:01 UTC
Did setup password less ssh.

015-08-18 03:49:18.143 65169 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: unable to connect to server at 'r14nova2.r14.rcbd.lab:49152': No route to host

Comment 11 Yogev Rabl 2015-08-18 07:11:15 UTC
In order to live migrate using ssh you need to change the nova user properties in /etc/passwd from /sbin/nologin to /bin/bash and allow all the nova user in every compute node to connect to the others. There's a bug about it upstream https://bugs.launchpad.net/nova/+bug/1428553

Comment 12 Rajini Karthik 2015-08-18 16:41:30 UTC
The ssh setting you recommended says is a security issue
We made it work with qmenu+tcp which is the default setting, basically network and firewall configs. 

1. Nova configuration on all nova-compute nodes
Live_migration_uri in nova.conf is set to qmenu+tcp which is default in the OSP7 deployment
Verify live_migration_flag is setup properly
Don't have to restart openstack-nova-compute if you don't edit the conf file

2. Libvirt Configuration on all nova compute nodes
Enable libvirt listen flag
  vi /etc/sysconfig/libvirtd
Enable /etc/libvirt/libvirtd.conf 
  listen_tls = 0
  listen_tcp = 1
  auth_tcp = “none”
  Restart libvirt
    systemctl libvirtd restart

3.Firewall configuration on all nova-compute nodes
vi  /etc/sysconfig/iptables
-A INPUT -p tcp -m multiport --ports 16509 -m comment --comment "libvirt" -j ACCEPT
-A INPUT -p tcp -m multiport --ports 49152:49216 -m comment --comment "migration" -j ACCEPT

Restart libvirt 
  systemctl iptables restart

Create a VM ( Boot from volume or boot from image)
nova show VM-iD, note the current host from OS-EXT-SRV-ATTR:host
nova host-list, note down the available hosts and pick a host0name
nova live-migration <VM-ID> <new-host-name>
nova show VM-iD, OS-EXT-SRV-ATTR:host  should show the new host

Comment 13 Kurt Hey 2015-08-18 21:23:27 UTC
Ian Pilcher and I have used the new enable_live_migration.sh script that was built and included in the .tgz on Jenkins and our initial testing shows that live migrations now function properly.  We brought up an instance and through Horizon, we did a Live Migrate and selected an new compute.  It successfully showed it being migrated from the original to the new.  Running a continuous ping to the floating ip only had one dropped ping during the process.

We also validated that after executing the script, a puppet run did not revert back to any of the original settings prior to the script being run so all looks good.

Comment 14 arkady kanevsky 2016-11-30 19:20:23 UTC
Rajini or Audra,
is this still an issue in JS-6.0?
Do we close this BZ or retarget it to OSP10 or 11?

Comment 15 Rajini Karthik 2016-11-30 20:45:26 UTC
We had a script in JS 4.0, but in 5.0 and 6.0 it worked out of the box. We don't use the script.

Comment 16 Rajini Karthik 2016-11-30 20:46:18 UTC
This bug can be closed