Bug 1657692

Summary: Incomplete networker nodes FQDN after director deploy update
Product: Red Hat OpenStack Reporter: ojanas
Component: openstack-tripleo-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: astupnik, dbecker, dprince, emilien, igallagh, jfrancoa, jraju, lbezdick, mburns, mflusche, morazi, nchandek, sandyada, sathlang, sgolovat, shdunne, slinaber, ssmolyak, tvignaud
Target Milestone: z13Keywords: Regression, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-5.6.8-29 openstack-tripleo-heat-templates-5.3.10-31.el7ost Doc Type: Enhancement
Doc Text:
The host parameters for OpenStack Networking (neutron) and OpenStack Compute (nova) are now set to a fixed UUID, based on the hostname associated with the CloudDomain. If while performing an update/upgrade the Compute or Networking service host parameter is going to be changed, the update/upgrade is stopped and an error message is displayed that points to the relevant documentation for manual intervention.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 09:40:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1596760    

Description ojanas 2018-12-10 09:57:20 UTC
Description of problem:

There is a bugzilla describing very similar issue [1][2] but it seems the issue still persists somehow. 
Note that customer runs "non-monolithic" controllers meaning they have separated "networker" nodes.

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete. 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-
3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong. 

Actual results:

The FQDN is incomplete.

Expected results:

The FQDN is complete.

Additional info:

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1638303
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600178

Comment 3 Sofer Athlan-Guyot 2018-12-20 16:58:54 UTC
Hi,

How reproducible:

1) Deploy RH OSP 10 from Director. Compute Nodes FQDN was incomplete.

Do you you if cloud domain name was properly configured ?  When you say FQDN was incomplete do you mean that the /etc/nova/nova.conf/[DEFAULT]/host value was the hostname only, or do you mean something else ?
 
2) Install the puppet-tripleo-5.6.8-17.el7ost package. This patch was created to solve a similar issue (Compute node FQDN but, in principle, had only to be applied after a deployment). All FQDNs were now correct-

What do you mean exactly:
 - install puppet-tripleo-5.6.8-17 on the undercloud and all overcloud nodes
 - then re-run a deployment;
 - and you got fqdn (in the nova.conf/[DEFAULT]/host value ?

3) After having found some issues with Ceph (there was an open case for it), we relaunched the Director deploy, to update its configuration. Now, Networker nodes FQDN were wrong.

You mean then went back to short host name, right.

By the way https://bugzilla.redhat.com/show_bug.cgi?id=1600178 had networker role and the -17 patch was done to ensure the names (in nova and neutron) where not mangled with.

What would help here would be the value of /etc/nova/nova.conf/[DEFAULT]/host at each step or command output (with nova or neutron) that show that that value has changed.

I precise description of how -17 was applied would be useful as well.

Thanks,

Comment 12 Sofer Athlan-Guyot 2019-02-07 14:17:50 UTC
Hi,

so it may be that the env was coming from osp9 ugprade, which would explain the short name.

In any cases, we should now focus on unlocking the costumer.

Note, that the host parameter can be anything as long as it's unique in the openstack cluster, fdqn, or short name do not really matter for that uuid.  What is problematic is a *change* in that parameter as then, half the agent will be dead.  Here is example of the situation after manually changing the host parameter in nova and neutron configuration:


[stack@undercloud-0 ~]$ cat nova-neutron-after-3                                                                                                                                                                    
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| Id  | Binary           | Host                     | Zone     | Status  | State | Updated_at                 | Disabled Reason |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+                                                                                   
| 17  | nova-consoleauth | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:59.000000 | -               |                                                                                   
| 26  | nova-scheduler   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:55:08.000000 | -               |                                                                                   
| 29  | nova-conductor   | controller-1.localdomain | internal | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 32  | nova-compute     | compute-0.localdomain    | nova     | enabled | down  | 2019-02-05T14:54:57.000000 | -               |                                                                                   
| 35  | nova-consoleauth | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 38  | nova-consoleauth | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:52.000000 | -               |                                                                                   
| 53  | nova-scheduler   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 56  | nova-scheduler   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:43.000000 | -               |                                                                                   
| 59  | nova-conductor   | controller-0.localdomain | internal | enabled | down  | 2019-02-05T14:55:04.000000 | -               |                                                                                   
| 68  | nova-conductor   | controller-2.localdomain | internal | enabled | down  | 2019-02-05T14:54:51.000000 | -               |                                                                                   
| 71  | nova-compute     | compute-1.localdomain    | nova     | enabled | down  | 2019-02-05T14:55:05.000000 | -               |                                                                                   
| 73  | nova-scheduler   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 76  | nova-scheduler   | controller-1             | internal | enabled | up    | 2019-02-07T10:53:59.000000 | -               |                                                                                   
| 79  | nova-conductor   | controller-2             | internal | enabled | up    | 2019-02-07T10:53:57.000000 | -               |                                                                                   
| 82  | nova-consoleauth | controller-2             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 85  | nova-compute     | compute-0                | nova     | enabled | up    | 2019-02-07T10:54:04.000000 | -               |                                                                                   
| 88  | nova-conductor   | controller-0             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 91  | nova-consoleauth | controller-0             | internal | enabled | up    | 2019-02-07T10:54:02.000000 | -               |                                                                                   
| 94  | nova-scheduler   | controller-2             | internal | enabled | up    | 2019-02-07T10:54:01.000000 | -               |                                                                                   
| 97  | nova-consoleauth | controller-1             | internal | enabled | up    | 2019-02-07T10:54:05.000000 | -               |                                                                                   
| 100 | nova-conductor   | controller-1             | internal | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
| 103 | nova-compute     | compute-1                | nova     | enabled | up    | 2019-02-07T10:54:06.000000 | -               |                                                                                   
+-----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+   

+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| id                                   | agent_type         | host                    | availability_zone | alive | admin_state_up | binary                    |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+                                                    
| 0fcc8481-ec53-4d00-96ad-e86fc9df5ad2 | L3 agent           | networker-0             | nova              | :-)   | True           | neutron-l3-agent          |                                                    
| 2c2cde23-beef-4d02-a9f6-880f133b6cb5 | Metadata agent     | networker-0             |                   | :-)   | True           | neutron-metadata-agent    |                                                    
| 42f2ed07-f108-4d56-9db1-6659cf5e6496 | Metadata agent     | networker-0.localdomain |                   | xxx   | True           | neutron-metadata-agent    |                                                    
| 44e48755-0724-4d42-af70-25e41085873f | Open vSwitch agent | compute-0.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 73a3f610-387e-4b5a-acbb-af3ab98e0f11 | Open vSwitch agent | compute-1               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 756e3247-0948-41f0-88f0-3efee59fcc8a | Open vSwitch agent | compute-1.localdomain   |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 8faa6859-e8c9-4de2-814f-5a963bfad1f5 | L3 agent           | networker-0.localdomain | nova              | xxx   | True           | neutron-l3-agent          |                                                    
| 98e2b72f-4964-42b2-bff0-2995a4f393f2 | Open vSwitch agent | compute-0               |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
| 9e983d8b-c5bf-41c1-b395-35e3852dba72 | Open vSwitch agent | networker-0.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |                                                    
| 9f9cf344-e763-41a7-8a3f-60814e714d19 | DHCP agent         | networker-0.localdomain | nova              | xxx   | True           | neutron-dhcp-agent        |                                                    
| c982b307-f93f-44ee-b486-970bcfc533b7 | DHCP agent         | networker-0             | nova              | :-)   | True           | neutron-dhcp-agent        |                                                    
| eb39760b-e268-462c-a7e9-61c9ec04f63d | Open vSwitch agent | networker-0             |                   | :-)   | True           | neutron-openvswitch-agent |                                                    
+--------------------------------------+--------------------+-------------------------+-------------------+-------+----------------+---------------------------+ 

Here I switched from fqdn to shortname.  All fqdn agents are seen as dead.  Any workload associated with them (compute instance, or fip for l3 agent) will be "lost" (but recoverable).

1. So what is currently not working on the client environment?

2. can we have the output of:

  #from the undercloud:
  . overcloudrc
  nova service-list
  neutron agent-list

3. we would need all the /var/log/yum.log of all the nodes and the output of rpm -qa on all nodes. (if not enough we may request sos-report, but currently those two commands should be enough)

After analyzing the feedback we can fix the env and make sure that any change in the host parameter won't happen again.

Comment 36 Sofer Athlan-Guyot 2019-09-18 15:55:41 UTC
*** Bug 1720005 has been marked as a duplicate of this bug. ***

Comment 57 errata-xmlrpc 2019-10-16 09:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3112

Comment 58 Martin Schuppert 2019-10-18 07:00:05 UTC
*** Bug 1747767 has been marked as a duplicate of this bug. ***