Bug 1536753 - live migration fails when hostnames are configured with "_" (underscore) due to inconsistent naming in /etc/hosts and /etc/ssh/ssh_known_hosts
Summary: live migration fails when hostnames are configured with "_" (underscore) due ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: All
OS: All
high
high
Target Milestone: Upstream M1
: ---
Assignee: Emilien Macchi
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-20 17:32 UTC by Andreas Karis
Modified: 2021-03-11 16:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-23 15:20:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andreas Karis 2018-01-20 17:32:54 UTC
Description of problem:
live migration fails when hostnames are configured with "_" (underscore) due to inconsistent naming in /etc/hosts and /etc/ssh/authorized_keys

Additional info:
this is easy to reproduce:

/home/stack/templates/network-environment.yaml
~~~
parameter_defaults:
(...)
  ComputeHostnameFormat: '%stackname%-compute_v1-%index%'
~~~

Then, deploy a new stack:
~~~
(...)
 Stack overcloud CREATE_COMPLETE

Host 10.0.0.5 not found in /home/stack/.ssh/known_hosts
Overcloud Endpoint: http://10.0.0.5:5000/v2.0
Overcloud Deployed
~~~

Then, verify:
~~~
[stack@undercloud-7 ~]$ source stackrc;  nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| 2cef3460-d931-4b8c-9e9d-d297e0f7b3bc | overcloud-compute_v1-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.16 |
| 358951bc-3f29-4f92-8aad-2f9fecac8f8f | overcloud-compute_v1-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
| 50261983-3699-4874-aeef-cd2ae52825df | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.12 |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
[stack@undercloud-7 ~]$ . overcloudrc
[stack@undercloud-7 ~]$ nova service-list
nova hyp+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host                               | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| 3  | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T16:34:35.000000 | -               |
| 4  | nova-scheduler   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T16:34:34.000000 | -               |
| 5  | nova-conductor   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T16:34:38.000000 | -               |
| 6  | nova-compute     | overcloud-compute-v1-0             | nova     | enabled | up    | 2018-01-20T16:34:44.000000 | -               |
| 7  | nova-compute     | overcloud-compute-v1-1             | nova     | enabled | up    | 2018-01-20T16:34:35.000000 | -               |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
e[stack@undercloud-7 ~]$ nova hypervisor-list
+----+------------------------+-------+---------+
| ID | Hypervisor hostname    | State | Status  |
+----+------------------------+-------+---------+
| 1  | overcloud-compute-v1-0 | up    | enabled |
| 2  | overcloud-compute-v1-1 | up    | enabled |
+----+------------------------+-------+---------+
[stack@undercloud-7 ~]$ ssh heat-admin.2.16 hostname
The authenticity of host '192.0.2.16 (192.0.2.16)' can't be established.
ECDSA key fingerprint is SHA256:+MA5u0VzqiLp+Q3RdHvfcXy9R+xNO6HU8sfvLCFHeo0.
ECDSA key fingerprint is MD5:b3:27:98:11:86:5d:08:32:26:b6:ef:73:00:80:cd:73.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.0.2.16' (ECDSA) to the list of known hosts.
overcloud-compute-v1-0
[stack@undercloud-7 ~]$ ssh heat-admin.2.16 "sudo grep compute_v1 /etc/ -R"
/etc/cloud/templates/hosts.redhat.tmpl:172.16.2.10 overcloud-compute_v1-0.localdomain overcloud-compute_v1-0
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.16 overcloud-compute_v1-0.external.localdomain overcloud-compute_v1-0.external
/etc/cloud/templates/hosts.redhat.tmpl:172.16.2.10 overcloud-compute_v1-0.internalapi.localdomain overcloud-compute_v1-0.internalapi
/etc/cloud/templates/hosts.redhat.tmpl:172.18.0.11 overcloud-compute_v1-0.storage.localdomain overcloud-compute_v1-0.storage
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.16 overcloud-compute_v1-0.storagemgmt.localdomain overcloud-compute_v1-0.storagemgmt
/etc/cloud/templates/hosts.redhat.tmpl:172.16.0.7 overcloud-compute_v1-0.tenant.localdomain overcloud-compute_v1-0.tenant
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.16 overcloud-compute_v1-0.management.localdomain overcloud-compute_v1-0.management
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.16 overcloud-compute_v1-0.ctlplane.localdomain overcloud-compute_v1-0.ctlplane
/etc/cloud/templates/hosts.redhat.tmpl:172.16.2.13 overcloud-compute_v1-1.localdomain overcloud-compute_v1-1
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.8 overcloud-compute_v1-1.external.localdomain overcloud-compute_v1-1.external
/etc/cloud/templates/hosts.redhat.tmpl:172.16.2.13 overcloud-compute_v1-1.internalapi.localdomain overcloud-compute_v1-1.internalapi
/etc/cloud/templates/hosts.redhat.tmpl:172.18.0.15 overcloud-compute_v1-1.storage.localdomain overcloud-compute_v1-1.storage
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.8 overcloud-compute_v1-1.storagemgmt.localdomain overcloud-compute_v1-1.storagemgmt
/etc/cloud/templates/hosts.redhat.tmpl:172.16.0.9 overcloud-compute_v1-1.tenant.localdomain overcloud-compute_v1-1.tenant
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.8 overcloud-compute_v1-1.management.localdomain overcloud-compute_v1-1.management
/etc/cloud/templates/hosts.redhat.tmpl:192.0.2.8 overcloud-compute_v1-1.ctlplane.localdomain overcloud-compute_v1-1.ctlplane
/etc/ssh/ssh_known_hosts:172.16.2.10,overcloud-compute_v1-0.localdomain,overcloud-compute_v1-0,192.0.2.16,overcloud-compute_v1-0.external.localdomain,overcloud-compute_v1-0.external,172.16.2.10,overcloud-compute_v1-0.internalapi.localdomain,overcloud-compute_v1-0.internalapi,172.18.0.11,overcloud-compute_v1-0.storage.localdomain,overcloud-compute_v1-0.storage,192.0.2.16,overcloud-compute_v1-0.storagemgmt.localdomain,overcloud-compute_v1-0.storagemgmt,172.16.0.7,overcloud-compute_v1-0.tenant.localdomain,overcloud-compute_v1-0.tenant,192.0.2.16,overcloud-compute_v1-0.management.localdomain,overcloud-compute_v1-0.management,192.0.2.16,overcloud-compute_v1-0.ctlplane.localdomain,overcloud-compute_v1-0.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBJQhCsIK7sAkmHZ4xg72t2wV39gvrjtE6g1dqZLXmSTB5ubMaqFTytzqvt/T+C8oUKPgX1bQfnIz4VyuKCx4qwQ=
/etc/ssh/ssh_known_hosts:172.16.2.13,overcloud-compute_v1-1.localdomain,overcloud-compute_v1-1,192.0.2.8,overcloud-compute_v1-1.external.localdomain,overcloud-compute_v1-1.external,172.16.2.13,overcloud-compute_v1-1.internalapi.localdomain,overcloud-compute_v1-1.internalapi,172.18.0.15,overcloud-compute_v1-1.storage.localdomain,overcloud-compute_v1-1.storage,192.0.2.8,overcloud-compute_v1-1.storagemgmt.localdomain,overcloud-compute_v1-1.storagemgmt,172.16.0.9,overcloud-compute_v1-1.tenant.localdomain,overcloud-compute_v1-1.tenant,192.0.2.8,overcloud-compute_v1-1.management.localdomain,overcloud-compute_v1-1.management,192.0.2.8,overcloud-compute_v1-1.ctlplane.localdomain,overcloud-compute_v1-1.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNFj5SSzkFnn4e417m0Ut+wOXONtEFslAYnGafSpAxasZGdJEpESzN0OhPj4aJNRomA/t2f6Xm2wjRCEjsX+ZiM=
grep: /etc/grub2-efi.cfg: No such file or directory
grep: /etc/extlinux.conf: No such file or directory
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0.internalapi.localdomain",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1.internalapi.localdomain"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-0",
/etc/puppet/hieradata/all_nodes.yaml:  "overcloud-compute_v1-1"
/etc/puppet/hieradata/bootstrap_node.yaml:bootstrap_nodeid: overcloud-compute_v1-0
/etc/hosts:172.16.2.10 overcloud-compute_v1-0.localdomain overcloud-compute_v1-0
/etc/hosts:192.0.2.16 overcloud-compute_v1-0.external.localdomain overcloud-compute_v1-0.external
/etc/hosts:172.16.2.10 overcloud-compute_v1-0.internalapi.localdomain overcloud-compute_v1-0.internalapi
/etc/hosts:172.18.0.11 overcloud-compute_v1-0.storage.localdomain overcloud-compute_v1-0.storage
/etc/hosts:192.0.2.16 overcloud-compute_v1-0.storagemgmt.localdomain overcloud-compute_v1-0.storagemgmt
/etc/hosts:172.16.0.7 overcloud-compute_v1-0.tenant.localdomain overcloud-compute_v1-0.tenant
/etc/hosts:192.0.2.16 overcloud-compute_v1-0.management.localdomain overcloud-compute_v1-0.management
/etc/hosts:192.0.2.16 overcloud-compute_v1-0.ctlplane.localdomain overcloud-compute_v1-0.ctlplane
/etc/hosts:172.16.2.13 overcloud-compute_v1-1.localdomain overcloud-compute_v1-1
/etc/hosts:192.0.2.8 overcloud-compute_v1-1.external.localdomain overcloud-compute_v1-1.external
/etc/hosts:172.16.2.13 overcloud-compute_v1-1.internalapi.localdomain overcloud-compute_v1-1.internalapi
/etc/hosts:172.18.0.15 overcloud-compute_v1-1.storage.localdomain overcloud-compute_v1-1.storage
/etc/hosts:192.0.2.8 overcloud-compute_v1-1.storagemgmt.localdomain overcloud-compute_v1-1.storagemgmt
/etc/hosts:172.16.0.9 overcloud-compute_v1-1.tenant.localdomain overcloud-compute_v1-1.tenant
/etc/hosts:192.0.2.8 overcloud-compute_v1-1.management.localdomain overcloud-compute_v1-1.management
/etc/hosts:192.0.2.8 overcloud-compute_v1-1.ctlplane.localdomain overcloud-compute_v1-1.ctlplane
[stack@undercloud-7 ~]$ 
~~~


Both live migrations with "_" and "-" fail:
~~~
[stack@undercloud-7 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks                                                             |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
| 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | cirros-test1 | ACTIVE | -          | Running     | private=192.168.0.12, 2000:192:168:1:f816:3eff:fec1:226d, 10.0.0.110 |
| 67cbc3cf-b379-4b45-92b4-9a690a5effd3 | rhel-test1   | ACTIVE | -          | Running     | private=192.168.0.3, 2000:192:168:1:f816:3eff:fe11:ca21, 10.0.0.107  |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
o[stack@undercloud-7 ~]$nova list --fields name,host
nova live-migrat+--------------------------------------+--------------+------------------------+
| ID                                   | Name         | Host                   |
+--------------------------------------+--------------+------------------------+
| 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | cirros-test1 | overcloud-compute-v1-0 |
| 67cbc3cf-b379-4b45-92b4-9a690a5effd3 | rhel-test1   | overcloud-compute-v1-1 |
+--------------------------------------+--------------+------------------------+
[stack@undercloud-7 ~]$ nova live-migration cirros-test1 overcloud-compute-v1-1
[stack@undercloud-7 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks                                                             |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
| 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | cirros-test1 | ACTIVE | -          | Running     | private=192.168.0.12, 2000:192:168:1:f816:3eff:fec1:226d, 10.0.0.110 |
| 67cbc3cf-b379-4b45-92b4-9a690a5effd3 | rhel-test1   | ACTIVE | -          | Running     | private=192.168.0.3, 2000:192:168:1:f816:3eff:fe11:ca21, 10.0.0.107  |
+--------------------------------------+--------------+--------+------------+-------------+----------------------------------------------------------------------+
[stack@undercloud-7 ~]$ nova list --fields name,host
+--------------------------------------+--------------+------------------------+
| ID                                   | Name         | Host                   |
+--------------------------------------+--------------+------------------------+
| 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | cirros-test1 | overcloud-compute-v1-0 |
| 67cbc3cf-b379-4b45-92b4-9a690a5effd3 | rhel-test1   | overcloud-compute-v1-1 |
+--------------------------------------+--------------+------------------------+
[stack@undercloud-7 ~]$ nova migration-list
+----+-------------+-----------+------------------------+------------------------+-----------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+
| Id | Source Node | Dest Node | Source Compute         | Dest Compute           | Dest Host | Status | Instance UUID                        | Old Flavor | New Flavor | Created At                 | Updated At                 | Type           |
+----+-------------+-----------+------------------------+------------------------+-----------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+
| 1  | -           | -         | overcloud-compute-v1-0 | overcloud-compute-v1-1 | -         | error  | 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | 1          | 1          | 2018-01-20T16:59:15.000000 | 2018-01-20T16:59:20.000000 | live-migration |
+----+-------------+-----------+------------------------+------------------------+-----------+--------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+
[stack@undercloud-7 ~]$ nova live-migration cirros-test1 overcloud-compute_v1-1
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'nova.exception.ComputeHostNotFound'> (HTTP 500) (Request-ID: req-53b96ca4-8bfb-4b51-bc0c-735063b1448c)
[stack@undercloud-7 ~]$
~~~

For the live migration with "-", the ERROR is:
~~~
[root@overcloud-compute-v1-0 ~]# grep ERROR /var/log/nova/nova-compute.log 
2018-01-20 16:07:28.454 16992 ERROR nova.compute.manager [req-9c175cc6-0563-4feb-ba7d-a7081cd21cac - - - - -] No compute node record for host overcloud-compute-v1-0
2018-01-20 16:55:53.479 59019 DEBUG oslo_service.service [req-c7a22b42-71fc-4e37-a2d5-261a3c4e8fa2 - - - - -] logging_exception_prefix       = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2622
2018-01-20 16:59:20.528 59019 ERROR nova.virt.libvirt.driver [req-a378cae1-71eb-49d3-a3c6-04b7133c1727 354194e670274527bb751e120fdb276b 4b9fb0b405434da6b215bca0bab4e654 - - -] [instance: 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration@overcloud-compute-v1-1/system?keyfile=/etc/nova/migration/identity: Cannot recv data: ssh: Could not resolve hostname overcloud-compute-v1-1: Name or service not known: Connection reset by peer
2018-01-20 16:59:20.575 59019 ERROR nova.virt.libvirt.driver [req-a378cae1-71eb-49d3-a3c6-04b7133c1727 354194e670274527bb751e120fdb276b 4b9fb0b405434da6b215bca0bab4e654 - - -] [instance: 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7] Migration operation has aborted
[root@overcloud-compute-v1-0 ~]# 
~~~

I fixed /etc/hosts manually:
~~~
[root@overcloud-compute-v1-0 ~]# diff /etc/hosts{.bck,}
25c25
< 172.16.2.10 overcloud-compute_v1-0.localdomain overcloud-compute_v1-0
---
> 172.16.2.10 overcloud-compute_v1-0.localdomain overcloud-compute_v1-0  overcloud-compute-v1-0.localdomain overcloud-compute-v1-0
34c34
< 172.16.2.13 overcloud-compute_v1-1.localdomain overcloud-compute_v1-1
---
> 172.16.2.13 overcloud-compute_v1-1.localdomain overcloud-compute_v1-1  overcloud-compute-v1-1.localdomain overcloud-compute-v1-1
[root@overcloud-compute-v1-0 ~]# 
~~~

Which then gets further, but fails on host key validation:
~~~
2018-01-20 17:12:57.116 59019 ERROR nova.virt.libvirt.driver [req-1523be96-5fef-4d20-a275-b08235f206d1 354194e670274527bb751e120fdb276b 4b9fb0b405434da6b215bca0bab4e654 - - -] [instance: 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+ssh://nova_migration@overcloud-compute-v1-1/system?keyfile=/etc/nova/migration/identity: Cannot recv data: Host key verification failed.: Connection reset by peer
2018-01-20 17:12:57.121 59019 ERROR nova.virt.libvirt.driver [req-1523be96-5fef-4d20-a275-b08235f206d1 354194e670274527bb751e120fdb276b 4b9fb0b405434da6b215bca0bab4e654 - - -] [instance: 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7] Migration operation has aborted
2018-01-20 17:13:05.980 59019 ERROR nova.virt.libvirt.host [req-06d738da-8383-4516-b681-256c1090688d - - - - -] Hostname has changed from overcloud-compute-v1-0 to overcloud-compute_v1-0.localdomain. A restart is required to take effect.
2018-01-20 17:13:05.999 59019 ERROR nova.virt.libvirt.host [req-06d738da-8383-4516-b681-256c1090688d - - - - -] Hostname has changed from overcloud-compute-v1-0 to overcloud-compute_v1-0.localdomain. A restart is required to take effect.
~~~

So finally, I changed:
~~~
[root@overcloud-compute-v1-0 ~]# diff /etc/ssh/ssh_known_hosts{,.bck}
2,3c2,3
< 172.16.2.10,overcloud-compute_v1-0.localdomain,overcloud-compute_v1-0,overcloud-compute-v1-0.localdomain,overcloud-compute-v1-0,192.0.2.16,overcloud-compute_v1-0.external.localdomain,overcloud-compute_v1-0.external,172.16.2.10,overcloud-compute_v1-0.internalapi.localdomain,overcloud-compute_v1-0.internalapi,172.18.0.11,overcloud-compute_v1-0.storage.localdomain,overcloud-compute_v1-0.storage,192.0.2.16,overcloud-compute_v1-0.storagemgmt.localdomain,overcloud-compute_v1-0.storagemgmt,172.16.0.7,overcloud-compute_v1-0.tenant.localdomain,overcloud-compute_v1-0.tenant,192.0.2.16,overcloud-compute_v1-0.management.localdomain,overcloud-compute_v1-0.management,192.0.2.16,overcloud-compute_v1-0.ctlplane.localdomain,overcloud-compute_v1-0.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBJQhCsIK7sAkmHZ4xg72t2wV39gvrjtE6g1dqZLXmSTB5ubMaqFTytzqvt/T+C8oUKPgX1bQfnIz4VyuKCx4qwQ= 
< 172.16.2.13,overcloud-compute_v1-1.localdomain,overcloud-compute_v1-1,overcloud-compute-v1-1.localdomain,overcloud-compute-v1-1,192.0.2.8,overcloud-compute_v1-1.external.localdomain,overcloud-compute_v1-1.external,172.16.2.13,overcloud-compute_v1-1.internalapi.localdomain,overcloud-compute_v1-1.internalapi,172.18.0.15,overcloud-compute_v1-1.storage.localdomain,overcloud-compute_v1-1.storage,192.0.2.8,overcloud-compute_v1-1.storagemgmt.localdomain,overcloud-compute_v1-1.storagemgmt,172.16.0.9,overcloud-compute_v1-1.tenant.localdomain,overcloud-compute_v1-1.tenant,192.0.2.8,overcloud-compute_v1-1.management.localdomain,overcloud-compute_v1-1.management,192.0.2.8,overcloud-compute_v1-1.ctlplane.localdomain,overcloud-compute_v1-1.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNFj5SSzkFnn4e417m0Ut+wOXONtEFslAYnGafSpAxasZGdJEpESzN0OhPj4aJNRomA/t2f6Xm2wjRCEjsX+ZiM= 
---
> 172.16.2.10,overcloud-compute_v1-0.localdomain,overcloud-compute_v1-0,192.0.2.16,overcloud-compute_v1-0.external.localdomain,overcloud-compute_v1-0.external,172.16.2.10,overcloud-compute_v1-0.internalapi.localdomain,overcloud-compute_v1-0.internalapi,172.18.0.11,overcloud-compute_v1-0.storage.localdomain,overcloud-compute_v1-0.storage,192.0.2.16,overcloud-compute_v1-0.storagemgmt.localdomain,overcloud-compute_v1-0.storagemgmt,172.16.0.7,overcloud-compute_v1-0.tenant.localdomain,overcloud-compute_v1-0.tenant,192.0.2.16,overcloud-compute_v1-0.management.localdomain,overcloud-compute_v1-0.management,192.0.2.16,overcloud-compute_v1-0.ctlplane.localdomain,overcloud-compute_v1-0.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBJQhCsIK7sAkmHZ4xg72t2wV39gvrjtE6g1dqZLXmSTB5ubMaqFTytzqvt/T+C8oUKPgX1bQfnIz4VyuKCx4qwQ= 
> 172.16.2.13,overcloud-compute_v1-1.localdomain,overcloud-compute_v1-1,192.0.2.8,overcloud-compute_v1-1.external.localdomain,overcloud-compute_v1-1.external,172.16.2.13,overcloud-compute_v1-1.internalapi.localdomain,overcloud-compute_v1-1.internalapi,172.18.0.15,overcloud-compute_v1-1.storage.localdomain,overcloud-compute_v1-1.storage,192.0.2.8,overcloud-compute_v1-1.storagemgmt.localdomain,overcloud-compute_v1-1.storagemgmt,172.16.0.9,overcloud-compute_v1-1.tenant.localdomain,overcloud-compute_v1-1.tenant,192.0.2.8,overcloud-compute_v1-1.management.localdomain,overcloud-compute_v1-1.management,192.0.2.8,overcloud-compute_v1-1.ctlplane.localdomain,overcloud-compute_v1-1.ctlplane ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNFj5SSzkFnn4e417m0Ut+wOXONtEFslAYnGafSpAxasZGdJEpESzN0OhPj4aJNRomA/t2f6Xm2wjRCEjsX+ZiM= 
~~~

Which the succeeds:
~~~
[stack@undercloud-7 ~]$ nova reset-state cirros-test1 --active
Reset state for server cirros-test1 succeeded; new state is active
(reverse-i-search)`l': nova live-migration cirros-test1 overc^Cud-compute-v1-1
[stack@undercloud-7 ~]$ ^C
[stack@undercloud-7 ~]$ nova live-migration cirros-test1 overcloud-compute-v1-1
[stack@undercloud-7 ~]$ nova list --fields name,host
+--------------------------------------+--------------+------------------------+
| ID                                   | Name         | Host                   |
+--------------------------------------+--------------+------------------------+
| 7ee38f55-2f1e-4c95-b5f7-ddfedf5134a7 | cirros-test1 | overcloud-compute-v1-1 |
| 67cbc3cf-b379-4b45-92b4-9a690a5effd3 | rhel-test1   | overcloud-compute-v1-1 |
+--------------------------------------+--------------+------------------------+
~~~

Comment 1 Andreas Karis 2018-01-20 19:03:40 UTC
Hello,

I applied the following:
~~~
ComputeHostnameFormat: '%stackname%-compute-v1-%index%'
~~~

Then, I reran `openstack overcloud deploy`.

This leads to:
~~~
[root@overcloud-compute-v1-0 ~]# grep '_v1' !$
grep '_v1' /etc/ssh/ssh_known_hosts
[root@overcloud-compute-v1-0 ~]# grep '_v1' /etc/hosts
[root@overcloud-compute-v1-0 ~]# 
~~~

And to a hostname change:
~~~
2018-01-20 18:07:34.066 59047 ERROR nova.virt.libvirt.host [req-57ec6d8c-80fc-414e-8a6f-96e57d593499 - - - - -] Hostname has changed from overcloud-compute-v1-1 to overcloud-compute-v1-1.localdomain. A restart is required to take effect.
~~~

After a restart on all computes:
~~~
[root@overcloud-compute-v1-1 ~]# systemctl restart openstack-nova-compute
[root@overcloud-compute-v1-1 ~]# 
~~~

This will lead to another rename of compute services (note the .localdomain):
~~~
[stack@undercloud-7 ~]$ nova service-list
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host                               | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| 3  | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:40.000000 | -               |
| 4  | nova-scheduler   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:37.000000 | -               |
| 5  | nova-conductor   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:34.000000 | -               |
| 6  | nova-compute     | overcloud-compute-v1-0             | nova     | enabled | down  | 2018-01-20T18:07:31.000000 | -               |
| 7  | nova-compute     | overcloud-compute-v1-1             | nova     | enabled | down  | 2018-01-20T18:07:37.000000 | -               |
| 8  | nova-compute     | overcloud-compute-v1-0.localdomain | nova     | enabled | down  | 2018-01-20T18:58:45.000000 | -               |
| 9  | nova-compute     | overcloud-compute-v1-1.localdomain | nova     | enabled | up    | 2018-01-20T19:00:39.000000 | -               |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
~~~

Live migration then fails, because a) in my env, v1-0 is down. But more importantly, the rename to .localdomain messed up other things:
~~~
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/conductor/tasks/live_migrate.py", line 49, in _execute
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server     self._check_host_is_up(self.source)
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/conductor/tasks/live_migrate.py", line 89, in _check_host_is_up
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server     raise exception.ComputeServiceUnavailable(host=host)
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server ComputeServiceUnavailable: Compute service of overcloud-compute-v1-1 is unavailable at this time.
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server 
~~~

- Andreas

Comment 2 Andreas Karis 2018-01-22 15:49:00 UTC
Correction:

it's not authorized_keys, it's /etc/ssh/ssh_known_hosts

Comment 3 Ollie Walsh 2018-01-22 17:14:49 UTC
The hostname has not been set (by cloud-init) because underscore is not a valid hostname character (see https://tools.ietf.org/html/rfc952). The hostname command will not accept this as a hostname e.g:

[root@undercloud stack]# hostname foo_
hostname: the specified hostname is invalid


Everything else has been configure expecting this invalid hostname to have been set.

SSH public/private key authentication failing is the most obvious side-effect of this, but there are likely to be other issues e.g the hostname being reported by nova-compute is also incorrect.

> oslo_messaging.rpc.server ComputeServiceUnavailable: Compute service of overcloud-compute-v1-1 is unavailable at this time.

What is the expectation here? That looks correct. It's now overcloud-compute-v1-1.localdomain. The overcloud-compute-v1-0 and overcloud-compute-v1-1 services should be deleted

Comment 4 Andreas Karis 2018-01-22 17:21:03 UTC
Hi,

The hostname change does not work. Check the .localdomain.

With the wrong settings, we don't get the .localdomain suffix:
~~~
| 3  | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:40.000000 | -               |
| 4  | nova-scheduler   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:37.000000 | -               |
| 5  | nova-conductor   | overcloud-controller-0.localdomain | internal | enabled | up    | 2018-01-20T19:00:34.000000 | -               |
| 6  | nova-compute     | overcloud-compute-v1-0             | nova     | enabled | down  | 2018-01-20T18:07:31.000000 | -               |
| 7  | nova-compute     | overcloud-compute-v1-1             | nova     | enabled | down  | 2018-01-20T18:07:37.000000 | -               |
| 8  | nova-compute     | overcloud-compute-v1-0.localdomain | nova     | enabled | down  | 2018-01-20T18:58:45.000000 | -               |
| 9  | nova-compute     | overcloud-compute-v1-1.localdomain | nova     | enabled | 
~~~

~~~
/var/log/nova/nova-conductor.log:2018-01-20 19:01:31.363 92122 ERROR oslo_messaging.rpc.server ComputeServiceUnavailable: Compute service of overcloud-compute-v1-1 is unavailable at this time.
~~~

The database does not contain entries for `overcloud-compute-v1-1.localdomain`, but only for `overcloud-compute-v1-1`. That means that even after a rename via Director, due to this issue, one cannot migrate the instances off if one does not go into the database and fix this manually.

Actually, I'm not asking for a mitigation here. I'm asking that we do not let customers set flavor names or ComputeHostnameFormat that contain "_". Or, alternatively, that we correctly convert all of them from "_" to "-".

Overall, this is a product bug: we either accept invalid input in our templates and/or do not convert "_" to "-" everywhere where we should do it.

Comment 13 Alex Schultz 2018-07-23 15:20:08 UTC
We added a validation in OSP11 to prevent the use of underscore in stacknames which is where this originally snuck in.  I believe we have a validation in place for FFU as well.  

The RHEL documentation has some additional details around valid hostnames.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_host_names#sec_Understanding_Host_Names

At this point I'm not sure there's much to do in 10 without possibly breaking existing deployments.  If a user has an existing stack deployed, they'll probably need to update the role hostname format to not have a '_' in it and update the node if possible.


Note You need to log in before you can comment on or make changes to this bug.