Bug 1657692 - Incomplete networker nodes FQDN after director deploy update
Summary: Incomplete networker nodes FQDN after director deploy update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z13
: 10.0 (Newton)
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1720005 1747767 (view as bug list)
Depends On:
Blocks: 1596760
TreeView+ depends on / blocked
 
Reported: 2018-12-10 09:57 UTC by ojanas
Modified: 2023-09-07 19:34 UTC (History)
19 users (show)

Fixed In Version: puppet-tripleo-5.6.8-29 openstack-tripleo-heat-templates-5.3.10-31.el7ost
Doc Type: Enhancement
Doc Text:
The host parameters for OpenStack Networking (neutron) and OpenStack Compute (nova) are now set to a fixed UUID, based on the hostname associated with the CloudDomain. If while performing an update/upgrade the Compute or Networking service host parameter is going to be changed, the update/upgrade is stopped and an error message is displayed that points to the relevant documentation for manual intervention.
Clone Of:
Environment:
Last Closed: 2019-10-16 09:40:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1600178 0 urgent CLOSED Neutron routers become unavailable after rebooting networker nodes post minor update 2022-08-17 16:33:27 UTC
Red Hat Bugzilla 1638303 0 high CLOSED The newly added compute node has domain part of FQDN missing in OSP services configuration 2022-08-02 17:02:17 UTC
Red Hat Issue Tracker OSP-11763 0 None None None 2021-12-10 18:31:15 UTC
Red Hat Issue Tracker UPG-43 0 None None None 2021-12-10 18:31:05 UTC
Red Hat Product Errata RHBA-2019:3112 0 None None None 2019-10-16 09:40:52 UTC

Description ojanas 2018-12-10 09:57:20 UTC
Description of problem:

There is a bugzilla describing very similar issue [1][2] but it seems the issue still persists somehow. 
Note that customer runs "non-monolithic" controllers meaning they have separated "networker" nodes.

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete. 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-
3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong. 

Actual results:

The FQDN is incomplete.

Expected results:

The FQDN is complete.

Additional info:

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1638303
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600178

Comment 3 Sofer Athlan-Guyot 2018-12-20 16:58:54 UTC
Hi,

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete.

Do you you if cloud domain name was properly configured ?  When you say FQDN was incomplete do you mean that the /etc/nova/nova.conf/[DEFAULT]/host value was the hostname only, or do you mean something else ?
 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-

What do you mean exactly:
 - install puppet-tripleo-5.6.8-17 on the undercloud and all overcloud nodes
 - then re-run a deployment;
 - and you got fqdn (in the nova.conf/[DEFAULT]/host value ?

3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong.

You mean then went back to short host name, right.

By the way https://bugzilla.redhat.com/show_bug.cgi?id=1600178 had networker role and the -17 patch was done to ensure the names (in nova and neutron) where not mangled with.

What would help here would be the value of /etc/nova/nova.conf/[DEFAULT]/host at each step or command output (with nova or neutron) that show that that value has changed.

I precise description of how -17 was applied would be useful as well.

Thanks,

Comment 12 Sofer Athlan-Guyot 2019-02-07 14:17:50 UTC
Hi,

so it may be that the env was coming from osp9 ugprade, which would explain the short name.

In any cases, we should now focus on unlocking the costumer.

Note, that the host parameter can be anything as long as it's unique in the openstack cluster, fdqn, or short name do not really matter for that uuid.  What is problematic is a *change* in that parameter as then, half the agent will be dead.  Here is example of the situation after manually changing the host parameter in nova and neutron configuration:


[stack@undercloud-0 ~]$ cat nova-neutron-after-3                                                                                                                                                                    
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| Id  | Binary           | Host                     | Zone     | Status  | State | Updated_at                 | Disabled Reason |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| 17  | nova-consoleauth | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:59.000000 | -               |                                                                                   
| 26  | nova-scheduler   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:55:08.000000 | -               |                                                                                   
| 29  | nova-conductor   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 32  | nova-compute     | compute-0.localdomain    | nova     | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 35  | nova-consoleauth | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 38  | nova-consoleauth | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:52.000000 | -               |                                                                                   
| 53  | nova-scheduler   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 56  | nova-scheduler   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:43.000000 | -               |                                                                                   
| 59  | nova-conductor   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:04.000000 | -               |                                                                                   
| 68  | nova-conductor   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:51.000000 | -               |                                                                                   
| 71  | nova-compute     | compute-1.localdomain    | nova     | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 73  | nova-scheduler   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 76  | nova-scheduler   | controller-1             | internal | enabled | up    | 2019-02-07T10:53:59.000000 | -               |                                                                                   
| 79  | nova-conductor   | controller-2             | internal | enabled | up    | 2019-02-07T10:53:57.000000 | -               |                                                                                   
| 82  | nova-consoleauth | controller-2             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 85  | nova-compute     | compute-0                | nova     | enabled | up    | 2019-02-07T10:54:04.000000 | -               |                                                                                   
| 88  | nova-conductor   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 91  | nova-consoleauth | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 94  | nova-scheduler   | controller-2             | internal | enabled | up    | 2019-02-07T10:54:01.000000 | -               |                                                                                   
| 97  | nova-consoleauth | controller-1             | internal | enabled | up    | 2019-02-07T10:54:05.000000 | -               |                                                                                   
| 100 | nova-conductor   | controller-1             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 103 | nova-compute     | compute-1                | nova     | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+   

+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| id                                   | agent_type         | host                    | availability_zone | alive | admin_state_up | binary                    |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| 0fcc8481-ec53-4d00-96ad-e86fc9df5ad2 | L3 agent           | networker-0             | nova              | :-)   | True           | neutron-l3-agent          |                                                    
| 2c2cde23-beef-4d02-a9f6-880f133b6cb5 | Metadata agent     | networker-0             |                   | :-)   | True           | neutron-metadata-agent    |                                                    
| 42f2ed07-f108-4d56-9db1-6659cf5e6496 | Metadata agent     | networker-0.localdomain |                   | xxx   | True           | neutron-metadata-agent    |                                                    
| 44e48755-0724-4d42-af70-25e41085873f | Open vSwitch agent | compute-0.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 73a3f610-387e-4b5a-acbb-af3ab98e0f11 | Open vSwitch agent | compute-1               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 756e3247-0948-41f0-88f0-3efee59fcc8a | Open vSwitch agent | compute-1.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 8faa6859-e8c9-4de2-814f-5a963bfad1f5 | L3 agent           | networker-0.localdomain | nova              | xxx   | True           | neutron-l3-agent          |                                                    
| 98e2b72f-4964-42b2-bff0-2995a4f393f2 | Open vSwitch agent | compute-0               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 9e983d8b-c5bf-41c1-b395-35e3852dba72 | Open vSwitch agent | networker-0.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 9f9cf344-e763-41a7-8a3f-60814e714d19 | DHCP agent         | networker-0.localdomain | nova              | xxx   | True           | neutron-dhcp-agent        |                                                    
| c982b307-f93f-44ee-b486-970bcfc533b7 | DHCP agent         | networker-0             | nova              | :-)   | True           | neutron-dhcp-agent        |                                                    
| eb39760b-e268-462c-a7e9-61c9ec04f63d | Open vSwitch agent | networker-0             |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ 

Here I switched from fqdn to shortname.  All fqdn agents are seen as dead.  Any workload associated with them (compute instance, or fip for l3 agent) will be "lost" (but recoverable).

1. So what is currently not working on the client environment?

2. can we have the output of:

  #from the undercloud:
  . overcloudrc
  nova service-list
  neutron agent-list

3. we would need all the /var/log/yum.log of all the nodes and the output of rpm -qa on all nodes. (if not enough we may request sos-report, but currently those two commands should be enough)

After analyzing the feedback we can fix the env and make sure that any change in the host parameter won't happen again.

Comment 36 Sofer Athlan-Guyot 2019-09-18 15:55:41 UTC
*** Bug 1720005 has been marked as a duplicate of this bug. ***

Comment 57 errata-xmlrpc 2019-10-16 09:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3112

Comment 58 Martin Schuppert 2019-10-18 07:00:05 UTC
*** Bug 1747767 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.