1657692 – Incomplete networker nodes FQDN after director deploy update

Bug 1657692 - Incomplete networker nodes FQDN after director deploy update

Summary: Incomplete networker nodes FQDN after director deploy update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	z13
Target Release:	10.0 (Newton)
Assignee:	RHOS Maint
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1720005 1747767 (view as bug list)
Depends On:
Blocks:	1596760
TreeView+	depends on / blocked

Reported:	2018-12-10 09:57 UTC by ojanas
Modified:	2023-09-07 19:34 UTC (History)
CC List:	19 users (show)
Fixed In Version:	puppet-tripleo-5.6.8-29 openstack-tripleo-heat-templates-5.3.10-31.el7ost
Doc Type:	Enhancement
Doc Text:	The host parameters for OpenStack Networking (neutron) and OpenStack Compute (nova) are now set to a fixed UUID, based on the hostname associated with the CloudDomain. If while performing an update/upgrade the Compute or Networking service host parameter is going to be changed, the update/upgrade is stopped and an error message is displayed that points to the relevant documentation for manual intervention.
Clone Of:
Environment:
Last Closed:	2019-10-16 09:40:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1600178	urgent	CLOSED	Neutron routers become unavailable after rebooting networker nodes post minor update	2022-08-17 16:33:27 UTC
Red Hat Bugzilla	1638303	high	CLOSED	The newly added compute node has domain part of FQDN missing in OSP services configuration	2022-08-02 17:02:17 UTC
Red Hat Issue Tracker	OSP-11763	None	None	None	2021-12-10 18:31:15 UTC
Red Hat Issue Tracker	UPG-43	None	None	None	2021-12-10 18:31:05 UTC
Red Hat Product Errata	RHBA-2019:3112	None	None	None	2019-10-16 09:40:52 UTC

Description ojanas 2018-12-10 09:57:20 UTC

Description of problem:

There is a bugzilla describing very similar issue [1][2] but it seems the issue still persists somehow. 
Note that customer runs "non-monolithic" controllers meaning they have separated "networker" nodes.

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete. 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-
3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong. 

Actual results:

The FQDN is incomplete.

Expected results:

The FQDN is complete.

Additional info:

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1638303
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600178

Comment 3 Sofer Athlan-Guyot 2018-12-20 16:58:54 UTC

Hi,

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete.

Do you you if cloud domain name was properly configured ?  When you say FQDN was incomplete do you mean that the /etc/nova/nova.conf/[DEFAULT]/host value was the hostname only, or do you mean something else ?
 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-

What do you mean exactly:
 - install puppet-tripleo-5.6.8-17 on the undercloud and all overcloud nodes
 - then re-run a deployment;
 - and you got fqdn (in the nova.conf/[DEFAULT]/host value ?

3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong.

You mean then went back to short host name, right.

By the way https://bugzilla.redhat.com/show_bug.cgi?id=1600178 had networker role and the -17 patch was done to ensure the names (in nova and neutron) where not mangled with.

What would help here would be the value of /etc/nova/nova.conf/[DEFAULT]/host at each step or command output (with nova or neutron) that show that that value has changed.

I precise description of how -17 was applied would be useful as well.

Thanks,

Comment 12 Sofer Athlan-Guyot 2019-02-07 14:17:50 UTC

Hi,

so it may be that the env was coming from osp9 ugprade, which would explain the short name.

In any cases, we should now focus on unlocking the costumer.

Note, that the host parameter can be anything as long as it's unique in the openstack cluster, fdqn, or short name do not really matter for that uuid.  What is problematic is a *change* in that parameter as then, half the agent will be dead.  Here is example of the situation after manually changing the host parameter in nova and neutron configuration:


[stack@undercloud-0 ~]$ cat nova-neutron-after-3                                                                                                                                                                    
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| Id  | Binary           | Host                     | Zone     | Status  | State | Updated_at                 | Disabled Reason |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| 17  | nova-consoleauth | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:59.000000 | -               |                                                                                   
| 26  | nova-scheduler   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:55:08.000000 | -               |                                                                                   
| 29  | nova-conductor   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 32  | nova-compute     | compute-0.localdomain    | nova     | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 35  | nova-consoleauth | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 38  | nova-consoleauth | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:52.000000 | -               |                                                                                   
| 53  | nova-scheduler   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 56  | nova-scheduler   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:43.000000 | -               |                                                                                   
| 59  | nova-conductor   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:04.000000 | -               |                                                                                   
| 68  | nova-conductor   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:51.000000 | -               |                                                                                   
| 71  | nova-compute     | compute-1.localdomain    | nova     | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 73  | nova-scheduler   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 76  | nova-scheduler   | controller-1             | internal | enabled | up    | 2019-02-07T10:53:59.000000 | -               |                                                                                   
| 79  | nova-conductor   | controller-2             | internal | enabled | up    | 2019-02-07T10:53:57.000000 | -               |                                                                                   
| 82  | nova-consoleauth | controller-2             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 85  | nova-compute     | compute-0                | nova     | enabled | up    | 2019-02-07T10:54:04.000000 | -               |                                                                                   
| 88  | nova-conductor   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 91  | nova-consoleauth | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 94  | nova-scheduler   | controller-2             | internal | enabled | up    | 2019-02-07T10:54:01.000000 | -               |                                                                                   
| 97  | nova-consoleauth | controller-1             | internal | enabled | up    | 2019-02-07T10:54:05.000000 | -               |                                                                                   
| 100 | nova-conductor   | controller-1             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 103 | nova-compute     | compute-1                | nova     | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+   

+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| id                                   | agent_type         | host                    | availability_zone | alive | admin_state_up | binary                    |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| 0fcc8481-ec53-4d00-96ad-e86fc9df5ad2 | L3 agent           | networker-0             | nova              | :-)   | True           | neutron-l3-agent          |                                                    
| 2c2cde23-beef-4d02-a9f6-880f133b6cb5 | Metadata agent     | networker-0             |                   | :-)   | True           | neutron-metadata-agent    |                                                    
| 42f2ed07-f108-4d56-9db1-6659cf5e6496 | Metadata agent     | networker-0.localdomain |                   | xxx   | True           | neutron-metadata-agent    |                                                    
| 44e48755-0724-4d42-af70-25e41085873f | Open vSwitch agent | compute-0.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 73a3f610-387e-4b5a-acbb-af3ab98e0f11 | Open vSwitch agent | compute-1               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 756e3247-0948-41f0-88f0-3efee59fcc8a | Open vSwitch agent | compute-1.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 8faa6859-e8c9-4de2-814f-5a963bfad1f5 | L3 agent           | networker-0.localdomain | nova              | xxx   | True           | neutron-l3-agent          |                                                    
| 98e2b72f-4964-42b2-bff0-2995a4f393f2 | Open vSwitch agent | compute-0               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 9e983d8b-c5bf-41c1-b395-35e3852dba72 | Open vSwitch agent | networker-0.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 9f9cf344-e763-41a7-8a3f-60814e714d19 | DHCP agent         | networker-0.localdomain | nova              | xxx   | True           | neutron-dhcp-agent        |                                                    
| c982b307-f93f-44ee-b486-970bcfc533b7 | DHCP agent         | networker-0             | nova              | :-)   | True           | neutron-dhcp-agent        |                                                    
| eb39760b-e268-462c-a7e9-61c9ec04f63d | Open vSwitch agent | networker-0             |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ 

Here I switched from fqdn to shortname.  All fqdn agents are seen as dead.  Any workload associated with them (compute instance, or fip for l3 agent) will be "lost" (but recoverable).

1. So what is currently not working on the client environment?

2. can we have the output of:

  #from the undercloud:
  . overcloudrc
  nova service-list
  neutron agent-list

3. we would need all the /var/log/yum.log of all the nodes and the output of rpm -qa on all nodes. (if not enough we may request sos-report, but currently those two commands should be enough)

After analyzing the feedback we can fix the env and make sure that any change in the host parameter won't happen again.

Comment 36 Sofer Athlan-Guyot 2019-09-18 15:55:41 UTC

*** Bug 1720005 has been marked as a duplicate of this bug. ***

Comment 57 errata-xmlrpc 2019-10-16 09:40:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3112

Comment 58 Martin Schuppert 2019-10-18 07:00:05 UTC

*** Bug 1747767 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.