1254307 – live migration failed when VM deployed from boot from volume on Ceph back end

Bug 1254307 - live migration failed when VM deployed from boot from volume on Ceph back end

Summary: live migration failed when VM deployed from boot from volume on Ceph back end

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	7.0 (Kilo)
Assignee:	Eoghan Glynn
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1172300
TreeView+	depends on / blocked

Reported:	2015-08-17 16:35 UTC by Rajini Karthik
Modified:	2019-09-09 16:35 UTC (History)
CC List:	28 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-30 20:48:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
LiveMigdefect (94.59 KB, application/zip) 2015-08-17 16:35 UTC, Rajini Karthik	no flags	Details
View All

Description Rajini Karthik 2015-08-17 16:35:24 UTC

Created attachment 1063978 [details]
LiveMigdefect

Description of problem:
live migration failed when VM deployed from image with ephemeral storage on Ceph back end 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
Step 1: Created volume with an image
Step 2: Launch instance by choosing an option boot from volume
Step 3: Migrating from compute01 to compute02 but after the live migration,instance still stay on compute01.

Actual results:
2015-08-14 13:39:31.073 44429 ERROR nova.virt.libvirt.driver [-] [instance: c0cb269f-635b-4276-a66e-ab851dc9cf31] Live Migration edit failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02.praveendo720pr.org/system: unable to connect to server at 'compute02.praveendo720pr.org:16509': No route to host

Expected results:
Successfull migration

Additional info:

1. Edited the    /etc/sysconfig/libvirtd and changed LIBVIRTD_ARGS="--listen"
2. /etc/libvirt/libvirtd.conf
listen_tls = 0
listen_tcp = 1
3. restarted libvirtd to make sure it is listening on 16509
4. Also had to edit iptables to accept 16509

But different error
2015-08-17 16:10:52.964 6976 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://r14nova3.r14.rcbd.lab/system: authentication failed: No authentication callback available

Comment 5 Rajini Karthik 2015-08-17 18:00:03 UTC

Basically I guess we need instructions to setup the access over TCP between the compute nodes. Is this supposed to work out of box?

Comment 7 Ian Pilcher 2015-08-17 18:04:41 UTC

Why is it using qemu+tcp?  Shouldn't it be using ssh?

Comment 8 Ian Pilcher 2015-08-17 18:15:44 UTC

(In reply to Ian Pilcher from comment #7)
> Why is it using qemu+tcp?  Shouldn't it be using ssh?

Answering my own question, it looks like this is controlled by the
live_migration_uri setting in /etc/nova/nova.conf.  So changing this
setting from qemu+tcp to qemu+ssh and enabling password-less ssh
between the nova user on all of the compute nodes might work.

Comment 9 Rajini Karthik 2015-08-18 03:30:44 UTC

Just changed to qmenu+ssh. Apparently there is more to this, I might have to setup the password-less 

2015-08-18 03:28:24.902 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://r14nova2.r14.rcbd.lab/system: Cannot recv data: Host key verification failed.: Connection reset by peer
2015-08-18 03:28:25.348 62777 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Migration operation has aborted

Comment 10 Rajini Karthik 2015-08-18 03:52:01 UTC

Did setup password less ssh.

015-08-18 03:49:18.143 65169 ERROR nova.virt.libvirt.driver [-] [instance: 64cf52f3-ca21-4a3a-b305-150170f35d83] Live Migration failure: unable to connect to server at 'r14nova2.r14.rcbd.lab:49152': No route to host

Comment 11 Yogev Rabl 2015-08-18 07:11:15 UTC

In order to live migrate using ssh you need to change the nova user properties in /etc/passwd from /sbin/nologin to /bin/bash and allow all the nova user in every compute node to connect to the others. There's a bug about it upstream https://bugs.launchpad.net/nova/+bug/1428553

Comment 12 Rajini Karthik 2015-08-18 16:41:30 UTC

The ssh setting you recommended says is a security issue
We made it work with qmenu+tcp which is the default setting, basically network and firewall configs. 

1. Nova configuration on all nova-compute nodes
Live_migration_uri in nova.conf is set to qmenu+tcp which is default in the OSP7 deployment
Verify live_migration_flag is setup properly
Don't have to restart openstack-nova-compute if you don't edit the conf file

2. Libvirt Configuration on all nova compute nodes
Enable libvirt listen flag
  vi /etc/sysconfig/libvirtd
  LIBVIRTD_ARGS=”–listen”
Enable /etc/libvirt/libvirtd.conf 
  listen_tls = 0
  listen_tcp = 1
  auth_tcp = “none”
  Restart libvirt
    systemctl libvirtd restart

3.Firewall configuration on all nova-compute nodes
vi  /etc/sysconfig/iptables
-A INPUT -p tcp -m multiport --ports 16509 -m comment --comment "libvirt" -j ACCEPT
-A INPUT -p tcp -m multiport --ports 49152:49216 -m comment --comment "migration" -j ACCEPT

Restart libvirt 
  systemctl iptables restart

4.Test
Create a VM ( Boot from volume or boot from image)
nova show VM-iD, note the current host from OS-EXT-SRV-ATTR:host
nova host-list, note down the available hosts and pick a host0name
nova live-migration <VM-ID> <new-host-name>
nova show VM-iD, OS-EXT-SRV-ATTR:host  should show the new host

Comment 13 Kurt Hey 2015-08-18 21:23:27 UTC

Ian Pilcher and I have used the new enable_live_migration.sh script that was built and included in the .tgz on Jenkins and our initial testing shows that live migrations now function properly.  We brought up an instance and through Horizon, we did a Live Migrate and selected an new compute.  It successfully showed it being migrated from the original to the new.  Running a continuous ping to the floating ip only had one dropped ping during the process.

We also validated that after executing the script, a puppet run did not revert back to any of the original settings prior to the script being run so all looks good.

Comment 14 arkady kanevsky 2016-11-30 19:20:23 UTC

Rajini or Audra,
is this still an issue in JS-6.0?
Do we close this BZ or retarget it to OSP10 or 11?

Comment 15 Rajini Karthik 2016-11-30 20:45:26 UTC

We had a script in JS 4.0, but in 5.0 and 6.0 it worked out of the box. We don't use the script.

Comment 16 Rajini Karthik 2016-11-30 20:46:18 UTC

This bug can be closed

Note You need to log in before you can comment on or make changes to this bug.

arkady_kanevsky
audra_cooper
berrange
bkopilov
cdevine
christopher_dearborn
dasmith
eglynn
ipilcher
John_walsh
kbader
kchamart
kschinck
kurt_hey
mburns
morazi
rajini.karthik
randy_perryman
rsussman
sbauza
sferdjao
sgordon
smerrow
sreichar
srevivo
vromanso
wayne_allen
yrabl