Description of problem: There is a bugzilla describing very similar issue [1][2] but it seems the issue still persists somehow. Note that customer runs "non-monolithic" controllers meaning they have separated "networker" nodes. How reproducible: 1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete. 2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct- 3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong. Actual results: The FQDN is incomplete. Expected results: The FQDN is complete. Additional info: [1] https://bugzilla.redhat.com/show_bug.cgi?id=1638303 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1600178
Hi, How reproducible: 1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete. Do you you if cloud domain name was properly configured ? When you say FQDN was incomplete do you mean that the /etc/nova/nova.conf/[DEFAULT]/host value was the hostname only, or do you mean something else ? 2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct- What do you mean exactly: - install puppet-tripleo-5.6.8-17 on the undercloud and all overcloud nodes - then re-run a deployment; - and you got fqdn (in the nova.conf/[DEFAULT]/host value ? 3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong. You mean then went back to short host name, right. By the way https://bugzilla.redhat.com/show_bug.cgi?id=1600178 had networker role and the -17 patch was done to ensure the names (in nova and neutron) where not mangled with. What would help here would be the value of /etc/nova/nova.conf/[DEFAULT]/host at each step or command output (with nova or neutron) that show that that value has changed. I precise description of how -17 was applied would be useful as well. Thanks,
Hi, so it may be that the env was coming from osp9 ugprade, which would explain the short name. In any cases, we should now focus on unlocking the costumer. Note, that the host parameter can be anything as long as it's unique in the openstack cluster, fdqn, or short name do not really matter for that uuid. What is problematic is a *change* in that parameter as then, half the agent will be dead. Here is example of the situation after manually changing the host parameter in nova and neutron configuration: [stack@undercloud-0 ~]$ cat nova-neutron-after-3 +-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | 17 | nova-consoleauth | controller-1.localdomain | internal | enabled | down | 2019-02-05T14:54:59.000000 | - | | 26 | nova-scheduler | controller-1.localdomain | internal | enabled | down | 2019-02-05T14:55:08.000000 | - | | 29 | nova-conductor | controller-1.localdomain | internal | enabled | down | 2019-02-05T14:54:57.000000 | - | | 32 | nova-compute | compute-0.localdomain | nova | enabled | down | 2019-02-05T14:54:57.000000 | - | | 35 | nova-consoleauth | controller-0.localdomain | internal | enabled | down | 2019-02-05T14:55:05.000000 | - | | 38 | nova-consoleauth | controller-2.localdomain | internal | enabled | down | 2019-02-05T14:54:52.000000 | - | | 53 | nova-scheduler | controller-0.localdomain | internal | enabled | down | 2019-02-05T14:55:05.000000 | - | | 56 | nova-scheduler | controller-2.localdomain | internal | enabled | down | 2019-02-05T14:54:43.000000 | - | | 59 | nova-conductor | controller-0.localdomain | internal | enabled | down | 2019-02-05T14:55:04.000000 | - | | 68 | nova-conductor | controller-2.localdomain | internal | enabled | down | 2019-02-05T14:54:51.000000 | - | | 71 | nova-compute | compute-1.localdomain | nova | enabled | down | 2019-02-05T14:55:05.000000 | - | | 73 | nova-scheduler | controller-0 | internal | enabled | up | 2019-02-07T10:54:02.000000 | - | | 76 | nova-scheduler | controller-1 | internal | enabled | up | 2019-02-07T10:53:59.000000 | - | | 79 | nova-conductor | controller-2 | internal | enabled | up | 2019-02-07T10:53:57.000000 | - | | 82 | nova-consoleauth | controller-2 | internal | enabled | up | 2019-02-07T10:54:02.000000 | - | | 85 | nova-compute | compute-0 | nova | enabled | up | 2019-02-07T10:54:04.000000 | - | | 88 | nova-conductor | controller-0 | internal | enabled | up | 2019-02-07T10:54:06.000000 | - | | 91 | nova-consoleauth | controller-0 | internal | enabled | up | 2019-02-07T10:54:02.000000 | - | | 94 | nova-scheduler | controller-2 | internal | enabled | up | 2019-02-07T10:54:01.000000 | - | | 97 | nova-consoleauth | controller-1 | internal | enabled | up | 2019-02-07T10:54:05.000000 | - | | 100 | nova-conductor | controller-1 | internal | enabled | up | 2019-02-07T10:54:06.000000 | - | | 103 | nova-compute | compute-1 | nova | enabled | up | 2019-02-07T10:54:06.000000 | - | +-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ +--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ | 0fcc8481-ec53-4d00-96ad-e86fc9df5ad2 | L3 agent | networker-0 | nova | :-) | True | neutron-l3-agent | | 2c2cde23-beef-4d02-a9f6-880f133b6cb5 | Metadata agent | networker-0 | | :-) | True | neutron-metadata-agent | | 42f2ed07-f108-4d56-9db1-6659cf5e6496 | Metadata agent | networker-0.localdomain | | xxx | True | neutron-metadata-agent | | 44e48755-0724-4d42-af70-25e41085873f | Open vSwitch agent | compute-0.localdomain | | xxx | True | neutron-openvswitch-agent | | 73a3f610-387e-4b5a-acbb-af3ab98e0f11 | Open vSwitch agent | compute-1 | | :-) | True | neutron-openvswitch-agent | | 756e3247-0948-41f0-88f0-3efee59fcc8a | Open vSwitch agent | compute-1.localdomain | | xxx | True | neutron-openvswitch-agent | | 8faa6859-e8c9-4de2-814f-5a963bfad1f5 | L3 agent | networker-0.localdomain | nova | xxx | True | neutron-l3-agent | | 98e2b72f-4964-42b2-bff0-2995a4f393f2 | Open vSwitch agent | compute-0 | | :-) | True | neutron-openvswitch-agent | | 9e983d8b-c5bf-41c1-b395-35e3852dba72 | Open vSwitch agent | networker-0.localdomain | | xxx | True | neutron-openvswitch-agent | | 9f9cf344-e763-41a7-8a3f-60814e714d19 | DHCP agent | networker-0.localdomain | nova | xxx | True | neutron-dhcp-agent | | c982b307-f93f-44ee-b486-970bcfc533b7 | DHCP agent | networker-0 | nova | :-) | True | neutron-dhcp-agent | | eb39760b-e268-462c-a7e9-61c9ec04f63d | Open vSwitch agent | networker-0 | | :-) | True | neutron-openvswitch-agent | +--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ Here I switched from fqdn to shortname. All fqdn agents are seen as dead. Any workload associated with them (compute instance, or fip for l3 agent) will be "lost" (but recoverable). 1. So what is currently not working on the client environment? 2. can we have the output of: #from the undercloud: . overcloudrc nova service-list neutron agent-list 3. we would need all the /var/log/yum.log of all the nodes and the output of rpm -qa on all nodes. (if not enough we may request sos-report, but currently those two commands should be enough) After analyzing the feedback we can fix the env and make sure that any change in the host parameter won't happen again.
*** Bug 1720005 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3112
*** Bug 1747767 has been marked as a duplicate of this bug. ***