Bug 1403338

Summary: rhel-osp-director: running "nova host-servers-migrate" as part of upgrade fails.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED NOTABUG QA Contact: Omri Hochman <ohochman>
Severity: unspecified Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: dbecker, emacchi, mburns, morazi, owalsh, rhel-osp-director-maint
Target Milestone: async   
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-16 12:22:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alexander Chuzhoy 2016-12-09 17:21:48 UTC
rhel-osp-director:  running "nova host-servers-migrate" as part of upgrade fails.

Environment:
openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch
openstack-nova-scheduler-14.0.2-7.el7ost.noarch
openstack-nova-conductor-14.0.2-7.el7ost.noarch
openstack-nova-compute-14.0.2-7.el7ost.noarch
puppet-nova-9.4.0-1.el7ost.noarch
openstack-puppet-modules-9.3.0-1.el7ost.noarch
instack-undercloud-5.1.0-4.el7ost.noarch
openstack-nova-network-14.0.2-7.el7ost.noarch
openstack-nova-cert-14.0.2-7.el7ost.noarch
python-nova-tests-14.0.2-7.el7ost.noarch
python-nova-14.0.2-7.el7ost.noarch
openstack-nova-placement-api-14.0.2-7.el7ost.noarch
openstack-nova-console-14.0.2-7.el7ost.noarch
openstack-nova-novncproxy-14.0.2-7.el7ost.noarch
python-novaclient-6.0.0-1.el7ost.noarch
openstack-nova-api-14.0.2-7.el7ost.noarch
openstack-tripleo-heat-templates-compat-2.0.0-41.el7ost.noarch
openstack-nova-cells-14.0.2-7.el7ost.noarch
openstack-nova-14.0.2-7.el7ost.noarch
openstack-nova-common-14.0.2-7.el7ost.noarch


Steps to reproduce:
1. Follow the upgrade doc until compute nodes: https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/paged/upgrading-red-hat-openstack-platform/chapter-3-director-based-environments-performing-upgrades-to-major-versions

2. You'd need to migrate VMs:
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage/#sect-Migrating_VMs_from_an_Overcloud_Compute_Node

3.
When you run:
nova host-servers-migrate [hostname]
The vm doesn't get migrated.

[stack@instack ~]$ nova host-servers-migrate overcloud-compute-0.localdomain
+--------------------------------------+--------------------+---------------+
| Server UUID                          | Migration Accepted | Error Message |
+--------------------------------------+--------------------+---------------+
| 1d145e4b-c6fa-4b91-99cb-921525f8b084 | True               |               |
+--------------------------------------+--------------------+---------------+




[stack@instack ~]$ nova migration-list
+----+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------+-----------+--------------------------------------+--------
----+------------+----------------------------+----------------------------+----------------+
| Id | Source Node                     | Dest Node                       | Source Compute                  | Dest Compute                    | Dest Host | Status    | Instance UUID                        | Old Fla
vor | New Flavor | Created At                 | Updated At                 | Type           |
+----+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------+-----------+--------------------------------------+--------
----+------------+----------------------------+----------------------------+----------------+
| 1  | -                               | -                               | overcloud-compute-0.localdomain | overcloud-compute-1.localdomain | -         | completed | 1d145e4b-c6fa-4b91-99cb-921525f8b084 | 4
    | 4          | 2016-12-08T15:10:19.000000 | 2016-12-08T15:10:23.000000 | live-migration |
| 4  | -                               | -                               | overcloud-compute-1.localdomain | overcloud-compute-0.localdomain | -         | completed | a6a47318-dab5-4680-8bc3-be48c15568ef | 4
    | 4          | 2016-12-08T15:13:47.000000 | 2016-12-08T15:13:58.000000 | live-migration |
| 7  | -                               | -                               | overcloud-compute-1.localdomain | overcloud-compute-0.localdomain | -         | completed | 1d145e4b-c6fa-4b91-99cb-921525f8b084 | 4
    | 4          | 2016-12-08T15:13:51.000000 | 2016-12-08T15:14:03.000000 | live-migration |
| 8  | -                               | -                               | overcloud-compute-0.localdomain | overcloud-compute-1.localdomain | -         | completed | a6a47318-dab5-4680-8bc3-be48c15568ef | 4          | 4          | 2016-12-09T14:25:12.000000 | 2016-12-09T14:25:23.000000 | live-migration |
| 11 | overcloud-compute-0.localdomain | overcloud-compute-1.localdomain | overcloud-compute-0.localdomain | overcloud-compute-1.localdomain | 192.0.2.8 | error     | 1d145e4b-c6fa-4b91-99cb-921525f8b084 | 4          | 4          | 2016-12-09T17:08:20.000000 | 2016-12-09T17:08:21.000000 | migration      |
+----+---------------------------------+---------------------------------+---------------------------------+---------------------------------+-----------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+


I see the following entry in nova-compute.log on the respective compute:
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher     self.instance_events.clear_events_for_instance(instance)
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher     self.gen.throw(type, value, traceback)
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6643, in _error_out_instance_on_exception
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher     raise error.inner_exception
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher Command: ssh -o BatchMode=yes 192.0.2.8 mkdir -p /var/lib/nova/instances/1d145e4b-c6fa-4b91-99cb-921525f8b084
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher Exit code: 255
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher Stdout: u''
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher Stderr: u'Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n'
2016-12-09 17:08:21.911 2641 ERROR oslo_messaging.rpc.dispatcher



Expected result:
Migration should work.

Comment 2 Alexander Chuzhoy 2016-12-09 18:24:55 UTC
The ssh keys weren't set properly on all compute nodes.
Need to make sure that the nova user is able to ssh from/to computes without issues.

Comment 3 Ollie Walsh 2016-12-16 12:22:57 UTC
Closing as the ssh keys were not set properly