Bug 1478346 - Nova Cold Migration uses wrong Network and fails
Nova Cold Migration uses wrong Network and fails
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
11.0 (Ocata)
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Eoghan Glynn
Joe H. Rahme
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-04 07:33 EDT by Johannes Beisiegel
Modified: 2017-08-17 13:10 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-17 13:10:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Johannes Beisiegel 2017-08-04 07:33:11 EDT
Description of problem:

On the latest OSP11, after succesfull Deployment, migrating Instances using "openstack server migrate <UUID>" will fail. Nova uses the wrong Network (External) instead of the correct internal_api.

Nova Log excerpt from the Migrating Compute Node:
 
2017-08-04 12:19:03.337 79081 INFO nova.compute.manager [req-c130e6f6-b261-470a-a2f0-915bb4bd0f73 f76a33bebac24bc5a1cb124afb8788d7 be4f2b9cc82842e28916d07f805cc016 - - -] [instance: 8a4e2b38-e776-44d5-b4a3-af03e6ecf017] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes <ExternalNet-IP> mkdir -p /var/lib/nova/instances/8a4e2b38-e776-44d5-b4a3-af03e6ecf017
Exit code: 255
Stdout: u''
Stderr: u'Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n'

The sshd_config on every Compute node by default only allows access via the internal_api Network for Migration.

Connecting to the Destination Node with: ssh nova_migration@<Internal_API_IP> -i identity works as expected

Adding the following the network-environment.yaml prior to Deployment will allow Migration via the External Network. This defaults to ctlplane which is also wrong, it should be internal_api (like NovaLibvirtNetwork)

ServiceNetMap:
    NovaColdMigrationNetwork: external

This will change the sshd_config to allow Migration access via the External Network

Version-Release number of selected component (if applicable):
Linux dkfzospd1.inet.dkfz-heidelberg.de 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 7.4 (Maipo)
Release:	7.4
Codename:	Maipo


How reproducible:
Migration fails reproducible

Steps to Reproduce:
1. Create an Instance on a Project
2. As admin execute: openstack server migrate <UUID>
3. Watch openstack server list --host <src-host> and/or nova-compute.log on <src-host>

Actual results:
Instance will stay on the same Host and Nova throws Errors in the Log.

Expected results:
Instance will migrate to new Host

Additional info:
It does not matter if the Instance is RUNNING or SHUTDOWN.

The Deployment consists of 3 Controller, NFS Storage Backend and No Swift or Ceph
Comment 1 Ollie Walsh 2017-08-10 11:58:32 EDT
(In reply to Johannes Beisiegel from comment #0)
> 
> Adding the following the network-environment.yaml prior to Deployment will
> allow Migration via the External Network. This defaults to ctlplane which is
> also wrong, it should be internal_api (like NovaLibvirtNetwork)
> 
> ServiceNetMap:
>     NovaColdMigrationNetwork: external
> 

This should be set to the network that compute nodes use as their default route. I suspect in this case the default route on the computes nodes has been set to the external network. Could you please including details of the network config that was used to deploy and any non-default parameters or custom environment files that were used?

Regards,
Ollie Walsh
Comment 2 Johannes Beisiegel 2017-08-14 09:04:23 EDT
Indeed our default route on Controller and Compute Nodes was the External Network (a remnant from our previous OSP10 Test). Changing that to ctlplane allows Migration via ctlplane.
Is there a reason why the NovaColdMigrationNetwork Setting is not honored by Nova Live migration? It does this with the sshd Configuration but not with the Routing.
Comment 3 Ollie Walsh 2017-08-14 12:52:26 EDT
(In reply to Johannes Beisiegel from comment #2)
> Is there a reason why the NovaColdMigrationNetwork Setting is not honored by
> Nova Live migration? It does this with the sshd Configuration but not with
> the Routing.

There currently is no nova config option to directly control the incoming IP for resize/cold-migration.
For live-migration there is the live_migration_inbound_addr option, however there are limitations when using this with SSH (e.g https://bugs.launchpad.net/nova/+bug/1671288)
Comment 4 Ollie Walsh 2017-08-17 13:10:43 EDT
Closing as not a bug as this is the expected behavior in OSP11.

Note You need to log in before you can comment on or make changes to this bug.