Bug 1310140

Summary: nova-serialproxy enabled causes live migration to fail
Product: Red Hat OpenStack Reporter: Jon Jozwiak <jjozwiak>
Component: openstack-novaAssignee: Lee Yarwood <lyarwood>
Status: CLOSED ERRATA QA Contact: Prasanth Anbalagan <panbalag>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: berrange, dasmith, david.costakos, dmaley, eglynn, hmatsumo, jduncan, kchamart, lyarwood, ndipanov, sbauza, sferdjao, sgordon, vromanso, yeylon
Target Milestone: asyncKeywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-nova-2015.1.3-2.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-24 13:55:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jon Jozwiak 2016-02-19 15:01:34 UTC
Description of problem:
When nova-serialproxy is used, live migration fails due to serial port issues

Version-Release number of selected component (if applicable):
RHEL OSP 7 y2

python-nova-2015.1.2-13.el7ost.noarch
openstack-nova-console-2015.1.2-13.el7ost.noarch
openstack-nova-scheduler-2015.1.2-13.el7ost.noarch
openstack-nova-serialproxy-2015.1.2-13.el7ost.noarch
openstack-nova-cert-2015.1.2-13.el7ost.noarch
openstack-nova-novncproxy-2015.1.2-13.el7ost.noarch
openstack-nova-api-2015.1.2-13.el7ost.noarch
openstack-nova-compute-2015.1.2-13.el7ost.noarch
python-novaclient-2.23.0-2.el7ost.noarch
openstack-nova-common-2015.1.2-13.el7ost.noarch
openstack-nova-conductor-2015.1.2-13.el7ost.noarch
libvirt-1.2.17-13.el7_2.3.x86_64

How reproducible:
Can reproduce every time

Steps to Reproduce:
1. Deploy OSP director (7.2) deployment with serialproxy enabled (Serial proxy customization documented here: https://github.com/jonjozwiak/openstack/tree/master/director-examples/serialproxy)
   In my case I used Cinder backed by NFS.  The same issue will exist with any backend
2. Boot an instance backed by cinder volume
nova boot --flavor 2 --block-device source=image,id=<id of image>,dest=volume,size=10,shutdown=preserve,bootindex=0 \
  myInstanceFromVolume
3. Attempt to migrate instance 
nova list 
nova show myInstanceFromVolume | grep host
nova live-migration <Instance ID> 

Actual results:
Instance does not migrate 

Expected results:
Instance moved to another hypervisor

Additional info:
Below is the /var/log/messages file from the compute node showing an error failing to bind a socket.  There is a bug upstream that is identical to this problem: 
     https://bugs.launchpad.net/nova/+bug/1455252 

To validate the issue is serial console specific, I editing /etc/nova/nova.conf on the compute and controllers, commented out the serial_console settings, and restarted nova on compute and controllers.  After that, I recreated a new instance and validated it's standard console was working (nova console-log <instance name> -> validate it gets a response).  After that, I did nova live-migrate and the migration worked without problems.  

If I revert and walk through the process again, it will fail to migrate again.  

/var/log/messages error on the compute node:  

Feb 18 17:48:55 overcloud-compute-1 journal: internal error: process exited while connecting to monitor: 2016-02-18T22:48:54.829400Z qemu-kvm: -chardev socket,id=charserial0,host=192.168.20.23,port=10000,server,nowait: Failed to bind socket: Cannot assign requested address
Feb 18 17:48:55 overcloud-compute-1 nova-compute: Traceback (most recent call last):
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
Feb 18 17:48:55 overcloud-compute-1 nova-compute: timer()
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
Feb 18 17:48:55 overcloud-compute-1 nova-compute: cb(*args, **kw)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 168, in _do_send
Feb 18 17:48:55 overcloud-compute-1 nova-compute: waiter.switch(result)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Feb 18 17:48:55 overcloud-compute-1 nova-compute: result = function(*args, **kwargs)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/nova/utils.py", line 997, in context_wrapper
Feb 18 17:48:55 overcloud-compute-1 nova-compute: return func(*args, **kwargs)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5674, in _live_migration_operation
Feb 18 17:48:55 overcloud-compute-1 nova-compute: instance=instance)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
Feb 18 17:48:55 overcloud-compute-1 nova-compute: six.reraise(self.type_, self.value, self.tb)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5643, in _live_migration_operation
Feb 18 17:48:55 overcloud-compute-1 nova-compute: CONF.libvirt.live_migration_bandwidth)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
Feb 18 17:48:55 overcloud-compute-1 nova-compute: result = proxy_call(self._autowrap, f, *args, **kwargs)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
Feb 18 17:48:55 overcloud-compute-1 nova-compute: rv = execute(f, *args, **kwargs)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
Feb 18 17:48:55 overcloud-compute-1 nova-compute: six.reraise(c, e, tb)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
Feb 18 17:48:55 overcloud-compute-1 nova-compute: rv = meth(*args, **kwargs)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1825, in migrateToURI2
Feb 18 17:48:55 overcloud-compute-1 nova-compute: if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
Feb 18 17:48:55 overcloud-compute-1 nova-compute: libvirtError: internal error: process exited while connecting to monitor: 2016-02-18T22:48:54.829400Z qemu-kvm: -chardev socket,id=charserial0,host=192.168.20.23,port=10000,server,nowait: Failed to bind socket: Cannot assign requested address

Comment 2 Lee Yarwood 2016-02-24 11:47:46 UTC
I've backported the following change to enable the live migration of instances with serial consoles attached :

libvirt: enable live migration with serial console
https://review.openstack.org/191035

However we are still susceptible to serial port collisions if the serial port used for the instance on the source is already in-use on the destination. This is still being worked on upstream :

https://review.openstack.org/#/q/topic:refactoring-libvirt

I have opened the following bug to track these and look into backporting , however I'm not entirely sure if this will be possible at present.

Serial port collisions can occur when live migrating instances
https://bugzilla.redhat.com/show_bug.cgi?id=1311514

Comment 8 errata-xmlrpc 2016-03-24 13:55:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0507.html