Red Hat Bugzilla – Bug 1478346
Nova Cold Migration uses wrong Network and fails
Last modified: 2017-08-17 13:10:43 EDT
Description of problem:
On the latest OSP11, after succesfull Deployment, migrating Instances using "openstack server migrate <UUID>" will fail. Nova uses the wrong Network (External) instead of the correct internal_api.
Nova Log excerpt from the Migrating Compute Node:
2017-08-04 12:19:03.337 79081 INFO nova.compute.manager [req-c130e6f6-b261-470a-a2f0-915bb4bd0f73 f76a33bebac24bc5a1cb124afb8788d7 be4f2b9cc82842e28916d07f805cc016 - - -] [instance: 8a4e2b38-e776-44d5-b4a3-af03e6ecf017] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes <ExternalNet-IP> mkdir -p /var/lib/nova/instances/8a4e2b38-e776-44d5-b4a3-af03e6ecf017
Exit code: 255
Stderr: u'Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n'
The sshd_config on every Compute node by default only allows access via the internal_api Network for Migration.
Connecting to the Destination Node with: ssh nova_migration@<Internal_API_IP> -i identity works as expected
Adding the following the network-environment.yaml prior to Deployment will allow Migration via the External Network. This defaults to ctlplane which is also wrong, it should be internal_api (like NovaLibvirtNetwork)
This will change the sshd_config to allow Migration access via the External Network
Version-Release number of selected component (if applicable):
Linux dkfzospd1.inet.dkfz-heidelberg.de 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.4 (Maipo)
Migration fails reproducible
Steps to Reproduce:
1. Create an Instance on a Project
2. As admin execute: openstack server migrate <UUID>
3. Watch openstack server list --host <src-host> and/or nova-compute.log on <src-host>
Instance will stay on the same Host and Nova throws Errors in the Log.
Instance will migrate to new Host
It does not matter if the Instance is RUNNING or SHUTDOWN.
The Deployment consists of 3 Controller, NFS Storage Backend and No Swift or Ceph
(In reply to Johannes Beisiegel from comment #0)
> Adding the following the network-environment.yaml prior to Deployment will
> allow Migration via the External Network. This defaults to ctlplane which is
> also wrong, it should be internal_api (like NovaLibvirtNetwork)
> NovaColdMigrationNetwork: external
This should be set to the network that compute nodes use as their default route. I suspect in this case the default route on the computes nodes has been set to the external network. Could you please including details of the network config that was used to deploy and any non-default parameters or custom environment files that were used?
Indeed our default route on Controller and Compute Nodes was the External Network (a remnant from our previous OSP10 Test). Changing that to ctlplane allows Migration via ctlplane.
Is there a reason why the NovaColdMigrationNetwork Setting is not honored by Nova Live migration? It does this with the sshd Configuration but not with the Routing.
(In reply to Johannes Beisiegel from comment #2)
> Is there a reason why the NovaColdMigrationNetwork Setting is not honored by
> Nova Live migration? It does this with the sshd Configuration but not with
> the Routing.
There currently is no nova config option to directly control the incoming IP for resize/cold-migration.
For live-migration there is the live_migration_inbound_addr option, however there are limitations when using this with SSH (e.g https://bugs.launchpad.net/nova/+bug/1671288)
Closing as not a bug as this is the expected behavior in OSP11.