Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1528858

Summary: Nova Cold Migration uses wrong Network and fails
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: akaris, berrange, dasmith, eglynn, j.beisiegel, jhakimra, kchamart, mburns, mirko.schmidt, owalsh, rhel-osp-director-maint, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1478346 Environment:
Last Closed: 2018-01-04 17:59:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1478346, 1501564    
Bug Blocks:    

Description Andreas Karis 2017-12-24 16:09:13 UTC
+++ This bug was initially created as a clone of Bug #1478346 +++

Description of problem:

On the latest OSP11, after succesfull Deployment, migrating Instances using "openstack server migrate <UUID>" will fail. Nova uses the wrong Network (External) instead of the correct internal_api.

Nova Log excerpt from the Migrating Compute Node:
 
2017-08-04 12:19:03.337 79081 INFO nova.compute.manager [req-c130e6f6-b261-470a-a2f0-915bb4bd0f73 f76a33bebac24bc5a1cb124afb8788d7 be4f2b9cc82842e28916d07f805cc016 - - -] [instance: 8a4e2b38-e776-44d5-b4a3-af03e6ecf017] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes <ExternalNet-IP> mkdir -p /var/lib/nova/instances/8a4e2b38-e776-44d5-b4a3-af03e6ecf017
Exit code: 255
Stdout: u''
Stderr: u'Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n'

The sshd_config on every Compute node by default only allows access via the internal_api Network for Migration.

Connecting to the Destination Node with: ssh nova_migration@<Internal_API_IP> -i identity works as expected

Adding the following the network-environment.yaml prior to Deployment will allow Migration via the External Network. This defaults to ctlplane which is also wrong, it should be internal_api (like NovaLibvirtNetwork)

ServiceNetMap:
    NovaColdMigrationNetwork: external

This will change the sshd_config to allow Migration access via the External Network

Version-Release number of selected component (if applicable):
Linux dkfzospd1.inet.dkfz-heidelberg.de 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 7.4 (Maipo)
Release:	7.4
Codename:	Maipo


How reproducible:
Migration fails reproducible

Steps to Reproduce:
1. Create an Instance on a Project
2. As admin execute: openstack server migrate <UUID>
3. Watch openstack server list --host <src-host> and/or nova-compute.log on <src-host>

Actual results:
Instance will stay on the same Host and Nova throws Errors in the Log.

Expected results:
Instance will migrate to new Host

Additional info:
It does not matter if the Instance is RUNNING or SHUTDOWN.

The Deployment consists of 3 Controller, NFS Storage Backend and No Swift or Ceph

--- Additional comment from Ollie Walsh on 2017-08-10 11:58:32 EDT ---

(In reply to Johannes Beisiegel from comment #0)
> 
> Adding the following the network-environment.yaml prior to Deployment will
> allow Migration via the External Network. This defaults to ctlplane which is
> also wrong, it should be internal_api (like NovaLibvirtNetwork)
> 
> ServiceNetMap:
>     NovaColdMigrationNetwork: external
> 

This should be set to the network that compute nodes use as their default route. I suspect in this case the default route on the computes nodes has been set to the external network. Could you please including details of the network config that was used to deploy and any non-default parameters or custom environment files that were used?

Regards,
Ollie Walsh

--- Additional comment from Johannes Beisiegel on 2017-08-14 09:04:23 EDT ---

Indeed our default route on Controller and Compute Nodes was the External Network (a remnant from our previous OSP10 Test). Changing that to ctlplane allows Migration via ctlplane.
Is there a reason why the NovaColdMigrationNetwork Setting is not honored by Nova Live migration? It does this with the sshd Configuration but not with the Routing.

--- Additional comment from Ollie Walsh on 2017-08-14 12:52:26 EDT ---

(In reply to Johannes Beisiegel from comment #2)
> Is there a reason why the NovaColdMigrationNetwork Setting is not honored by
> Nova Live migration? It does this with the sshd Configuration but not with
> the Routing.

There currently is no nova config option to directly control the incoming IP for resize/cold-migration.
For live-migration there is the live_migration_inbound_addr option, however there are limitations when using this with SSH (e.g https://bugs.launchpad.net/nova/+bug/1671288)

--- Additional comment from Ollie Walsh on 2017-08-17 13:10:43 EDT ---

Closing as not a bug as this is the expected behavior in OSP11.

--- Additional comment from Andreas Karis on 2017-12-24 10:32:05 EST ---

The same applies in OSP 10 by the way (just for clarity).

--- Additional comment from Andreas Karis on 2017-12-24 11:02:28 EST ---

I'm taking that back, in OSP 10, it's NovaApiNetwork:
~~~
[stack@undercloud-r430 ~]$ grep cold_migration_ssh /usr/share/openstack-tripleo-heat-templates/* -R
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:              - "%{hiera('cold_migration_ssh_inbound_addr')}"
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:            cold_migration_ssh_inbound_addr: {get_param: [ServiceNetMap, NovaApiNetwork]}
~~~

Comment 1 Andreas Karis 2017-12-24 16:17:31 UTC
### OSP 11 ###
Compute nodes use the network of the default route for cold migration. In order to correctly configure `/etc/ssh/sshd_config`, modify the `ServiceNetMap`, for example in `network-environment.yaml`, and set `NovaColdMigrationNetwork` to the network on the compute nodes with the default route:
~~~
parameter_defaults:
(...)
  ServiceNetMap:
    NovaColdMigrationNetwork: external
~~~

The above should be set to the network that compute nodes use as their default route. In this example case, the default route on the computes nodes has been set to the external network. 

### OSP 10 ###
In OSP 10, this is slightly more complex, as parameter `NovaColdMigrationNetwork` does not exist. Instead, `NovaApiNetwork` is used.
~~~
[stack@undercloud-r430 ~]$ grep cold_migration_ssh /usr/share/openstack-tripleo-heat-templates/* -R
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:              - "%{hiera('cold_migration_ssh_inbound_addr')}"
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:            cold_migration_ssh_inbound_addr: {get_param: [ServiceNetMap, NovaApiNetwork]}
~~~

This network, by default, maps to `internal_api`:
~~~
/usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-keystone-admin-internal-api.yaml:    NovaApiNetwork: internal_api
~~~

A workaround could hence be to change the `ServiceNetMap` again, however this will also move VIPs and other services to the network which has the default route, and hence could introduce a security risk:
~~~
parameter_defaults:
(...)
  ServiceNetMap:
    NovaApiNetwork: ctlplane
~~~

Currently, as bug report was opened to address this issue: [https://bugzilla.redhat.com/show_bug.cgi?id=1528858](https://bugzilla.redhat.com/show_bug.cgi?id=1528858)

Comment 2 Andreas Karis 2017-12-24 16:23:58 UTC
Can we get a backport of NovaColdMigrationNetwork  into OSP 10? Or does this have any further implications? I don't understand why this is different in OSP 10 and 11:

OSP 11:
~~~
[root@undercloud-8 ~]# grep NovaColdMigrationNetwork /usr/share/openstack-tripleo-heat-templates/* -R
/usr/share/openstack-tripleo-heat-templates/network/service_net_map.j2.yaml:      NovaColdMigrationNetwork: ctlplane
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:            cold_migration_ssh_inbound_addr: {get_param: [ServiceNetMap, NovaColdMigrationNetwork]}
~~~

OSP 10:
~~~
[stack@undercloud-r430 ~]$ grep cold_migration_ssh /usr/share/openstack-tripleo-heat-templates/* -R
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:              - "%{hiera('cold_migration_ssh_inbound_addr')}"
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml:            cold_migration_ssh_inbound_addr: {get_param: [ServiceNetMap, NovaApiNetwork]}
~~~

If I'm correct, then this is actually a fairly easy, yet important backport for OSP 10. I guess we should simply leave the default in OSP 10 at internal_api not to break anything existing?
~~~
/usr/share/openstack-tripleo-heat-templates/network/service_net_map.j2.yaml:      NovaApiNetwork: internal_api
~~~

So:
~~~
NovaColdMigrationNetwork: internal_api
~~~


Though then things could break on an upgrade to OSP 11 ;-)

Comment 3 Andreas Karis 2018-01-04 17:55:10 UTC
Just for completeness:

https://bugzilla.redhat.com/show_bug.cgi?id=1478346#c7

 Ollie Walsh 2018-01-04 12:50:04 EST

Yes, NovaColdMigrationNetwork was removed in https://bugzilla.redhat.com/show_bug.cgi?id=1501564. Nova now uses the internal_api network for cold migration instead of relying on the default route.

*** This bug has been marked as a duplicate of bug 1501564 ***

Resolution: NOTABUG → DUPLICATE

Comment 4 Ollie Walsh 2018-01-04 17:59:26 UTC

*** This bug has been marked as a duplicate of bug 1486948 ***