Description of problem: On the latest OSP11, after succesfull Deployment, migrating Instances using "openstack server migrate <UUID>" will fail. Nova uses the wrong Network (External) instead of the correct internal_api. Nova Log excerpt from the Migrating Compute Node: 2017-08-04 12:19:03.337 79081 INFO nova.compute.manager [req-c130e6f6-b261-470a-a2f0-915bb4bd0f73 f76a33bebac24bc5a1cb124afb8788d7 be4f2b9cc82842e28916d07f805cc016 - - -] [instance: 8a4e2b38-e776-44d5-b4a3-af03e6ecf017] Setting instance back to ACTIVE after: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command. Command: ssh -o BatchMode=yes <ExternalNet-IP> mkdir -p /var/lib/nova/instances/8a4e2b38-e776-44d5-b4a3-af03e6ecf017 Exit code: 255 Stdout: u'' Stderr: u'Permission denied (publickey,gssapi-keyex,gssapi-with-mic).\r\n' The sshd_config on every Compute node by default only allows access via the internal_api Network for Migration. Connecting to the Destination Node with: ssh nova_migration@<Internal_API_IP> -i identity works as expected Adding the following the network-environment.yaml prior to Deployment will allow Migration via the External Network. This defaults to ctlplane which is also wrong, it should be internal_api (like NovaLibvirtNetwork) ServiceNetMap: NovaColdMigrationNetwork: external This will change the sshd_config to allow Migration access via the External Network Version-Release number of selected component (if applicable): Linux dkfzospd1.inet.dkfz-heidelberg.de 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.4 (Maipo) Release: 7.4 Codename: Maipo How reproducible: Migration fails reproducible Steps to Reproduce: 1. Create an Instance on a Project 2. As admin execute: openstack server migrate <UUID> 3. Watch openstack server list --host <src-host> and/or nova-compute.log on <src-host> Actual results: Instance will stay on the same Host and Nova throws Errors in the Log. Expected results: Instance will migrate to new Host Additional info: It does not matter if the Instance is RUNNING or SHUTDOWN. The Deployment consists of 3 Controller, NFS Storage Backend and No Swift or Ceph
(In reply to Johannes Beisiegel from comment #0) > > Adding the following the network-environment.yaml prior to Deployment will > allow Migration via the External Network. This defaults to ctlplane which is > also wrong, it should be internal_api (like NovaLibvirtNetwork) > > ServiceNetMap: > NovaColdMigrationNetwork: external > This should be set to the network that compute nodes use as their default route. I suspect in this case the default route on the computes nodes has been set to the external network. Could you please including details of the network config that was used to deploy and any non-default parameters or custom environment files that were used? Regards, Ollie Walsh
Indeed our default route on Controller and Compute Nodes was the External Network (a remnant from our previous OSP10 Test). Changing that to ctlplane allows Migration via ctlplane. Is there a reason why the NovaColdMigrationNetwork Setting is not honored by Nova Live migration? It does this with the sshd Configuration but not with the Routing.
(In reply to Johannes Beisiegel from comment #2) > Is there a reason why the NovaColdMigrationNetwork Setting is not honored by > Nova Live migration? It does this with the sshd Configuration but not with > the Routing. There currently is no nova config option to directly control the incoming IP for resize/cold-migration. For live-migration there is the live_migration_inbound_addr option, however there are limitations when using this with SSH (e.g https://bugs.launchpad.net/nova/+bug/1671288)
Closing as not a bug as this is the expected behavior in OSP11.
The same applies in OSP 10 by the way (just for clarity).
I'm taking that back, in OSP 10, it's NovaApiNetwork: ~~~ [stack@undercloud-r430 ~]$ grep cold_migration_ssh /usr/share/openstack-tripleo-heat-templates/* -R /usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml: - "%{hiera('cold_migration_ssh_inbound_addr')}" /usr/share/openstack-tripleo-heat-templates/puppet/services/nova-compute.yaml: cold_migration_ssh_inbound_addr: {get_param: [ServiceNetMap, NovaApiNetwork]} ~~~
Yes, NovaColdMigrationNetwork was removed in https://bugzilla.redhat.com/show_bug.cgi?id=1501564. Nova now uses the internal_api network for cold migration instead of relying on the default route. *** This bug has been marked as a duplicate of bug 1501564 ***