Bug 1122457

Summary: Live Migration failure: operation failed: Failed to connect to remote libvirt URI
Product: Red Hat OpenStack Reporter: Gabriel Szasz <gszasz>
Component: openstack-novaAssignee: Russell Bryant <rbryant>
Status: CLOSED ERRATA QA Contact: Gabriel Szasz <gszasz>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.0 (RHEL 6)CC: gdubreui, gszasz, ichavero, mlopes, ndipanov, rhallise, sclewis, sgordon, yeylon
Target Milestone: rc   
Target Release: 5.0 (RHEL 6)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-packstack-2014.1.1-0.37.dev1238.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the Packstack installer default for live migration was through QEMU's SSH protocol. However, as the Compute (nova) user account was setup with a nologin shell, the SSH connection would fail and result in the libvirt error "operation failed: Failed to connect to remote libvirt URI". Consequently, instances were not successfully live migrated. With this update, instances are instead live migrated using QEMU's TCP protocol, and as a result, live migration is expected to complete successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-02 18:22:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/nova/compute.log on the source node
none
/var/log/nova/compute.log on the target node
none
/var/log/audit/audit.log on source compute node
none
/var/log/audit/audit.log on target compute node none

Description Gabriel Szasz 2014-07-23 10:07:41 UTC
Description of problem

Nova block live migration fails with error.

Version-Release number of selected component (if applicable):
openstack-nova-2014.1.1-3.el6ost
openstack-selinux-0.1.4-2.el6ost

How reproducible: 100%

Steps to Reproduce:
1. Deploy RHOS 5 in three-node setup (1 controller / 2 compute nodes) by Packstack

      # packstack --install-hosts=${controller_ip},${node1_ip},${node2_ip} \
          --nagios-install=n \
          --os-ceilometer-install=n \
          --os-neutron-install=n \
          --novanetwork-pubif=${controller_if} \
          --novanetwork-privif=${controller_if} \
          --novacompute-privif=${node1_if} \
          --keystone-admin-passwd=redhat \
          --keystone-demo-passwd=redhat \
          --ssh-public-key=/root/.ssh/id_rsa.pub

2. Convince yourself that both compute nodes have the same CPUs and the same network interfaces.

3. Add following line to /etc/nova/nova.conf on both compute nodes:

live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE

4. Restart openstack-nova-compute service on both compute nodes.

5. Run following commands:

    # source keystonerc_admin
    # nova keypair-add --pub-key ~/.ssh/id_rsa root
    # nova boot --flavor m1.small --image cirros test --key_name root

6. Check the compute node on which was the instance spawned. In the next step use the other compute node as target host.

7. Run command:

    # nova live-migration --block-migrate test ${target_node}



Actual results:

CLI displays no error, but the instance is not migrated


Expected results:

Instance is migrated to the target host.


Additional info:

Following message appears in /var/log/nova/nova-compute.log on the source compute node:

2014-07-23 05:48:00.308 1448 ERROR nova.virt.libvirt.driver [-] [instance: 8c02e375-acb8-4231-80fa-bf36e529c560] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova.lab.bos.redhat.com/system?no_verify=1&keyfile=/etc/nova/ssh/nova_migration_key


Workaround:
-----------
1. Run following command on both compute nodes:

  # setenforce 0

2. Change login shell for 'nova' user to '/bin/bash':

  # cp /etc/passwd /etc/passwd~
  # sed 's/\(^nova.*\)\/sbin\/nologin/\1\/bin\/bash/' /etc/passwd~

Live migration does not work without second step.

Comment 1 Gabriel Szasz 2014-07-23 10:13:39 UTC
Created attachment 920186 [details]
/var/log/nova/compute.log on the source node

Comment 2 Gabriel Szasz 2014-07-23 10:14:35 UTC
Created attachment 920187 [details]
/var/log/nova/compute.log on the target node

Comment 3 Gabriel Szasz 2014-07-23 10:15:18 UTC
Created attachment 920189 [details]
/var/log/audit/audit.log on source compute node

Comment 4 Gabriel Szasz 2014-07-23 10:15:55 UTC
Created attachment 920190 [details]
/var/log/audit/audit.log on target compute node

Comment 5 Gabriel Szasz 2014-07-23 10:24:42 UTC
Workaround:
-----------
1. Run following command on both compute nodes:

  # setenforce 0

2. Change login shell for 'nova' user to '/bin/bash':

  # cp /etc/passwd /etc/passwd~
  # sed 's/\(^nova.*\)\/sbin\/nologin/\1\/bin\/bash/' /etc/passwd~ > /etc/passwd

Tested several times - it really seems that qemu+ssh connection does not work when 'nova' user login shell is being set to /sbin/nologin. This issue is a regression for the latest puddle.

Comment 6 Vladan Popovic 2014-07-23 12:19:18 UTC
This issue seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1117524.

Could you try openstack-packstack-2014.1.1-0.36.dev1220.el7ost and see if you run into this again?

Comment 7 Ryan Hallisey 2014-07-24 11:50:53 UTC
I see some AVCs in the logs.  I'll add to newest builds.

allow sshd_t nova_var_lib_t:dir { search getattr };
allow sshd_t nova_var_lib_t:file read_file_perms;

Comment 8 Ryan Hallisey 2014-07-24 12:30:39 UTC
openstack-selinux-0.1.5-1.el6ost.src.rpm

Should take care of your AVCs

Comment 9 Stephen Gordon 2014-08-04 15:20:25 UTC
Can you please retest?

Comment 10 Gilles Dubreuil 2014-08-13 05:53:43 UTC
Please see BZ#1117524 for issue update.

Backport is now available for Icehouse.

Comment 11 Gilles Dubreuil 2014-08-13 12:49:18 UTC
Tested on RHEL6, Packstack Icehouse branch with related patches merged:

Migrated instances without shared storage and using qemu+tcp

Comment 17 errata-xmlrpc 2014-09-02 18:22:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1126.html