Bug 1933528

Summary: Overcloud host entries of /etc/hosts on the undercloud were removed after re-executing openstack undercloud install
Product: Red Hat OpenStack Reporter: Masayuki Igawa <migawa>
Component: tripleo-ansibleAssignee: Brendan Shephard <bshephar>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: aschultz, astupnik, bdobreli, bshephar, cjeanner, hbrock, jhajyahy, jschluet, jslagle, marjones, mburns
Target Milestone: z7Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.5.1-1.20210520163301.902c3c8.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:18:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Masayuki Igawa 2021-03-01 00:11:37 UTC
Description of problem:

Overcloud host entries of /etc/hosts on the undercloud were removed after re-executing `openstack undercloud install` like the following.
~~~
(undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
192.168.24.1 undercloud-0.redhat.local undercloud-0
192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane

# END ANSIBLE MANAGED BLOCK
127.0.0.1   undercloud-0.redhat.local undercloud-0
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
~~~
All the overcloud host entries were deleted.


Version-Release number of selected component (if applicable):
 - python3-tripleoclient-12.3.2-1.20200914164930.el8ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. Execute openstack undercloud install
2. Execute `openstack undercloud install` again

Actual results:

Overcloud host entries of /etc/hosts file on the undercloud node are deleted.

Expected results:

Overcloud host entries of /etc/hosts file on the undercloud node are remaining after executing `openstack undercloud install`.


Additional info:

I added some overcloud host entries and executed `openstack undercloud install` again. And it can be reproduced.
Here's the diff between the /etc/hosts.
~~~
(undercloud) [stack@undercloud-0 ~]$ diff -Npu /etc/hosts hosts.backup-20210225-1300|less
--- /etc/hosts  2021-02-25 04:06:04.187073777 +0000
+++ hosts.backup-20210225-1300  2021-02-25 04:00:44.986120333 +0000
@@ -3,6 +3,29 @@
 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane
 
+172.17.1.78 controller-0.redhat.local controller-0
+172.17.3.135 controller-0.storage.redhat.local controller-0.storage
+172.17.4.126 controller-0.storagemgmt.redhat.local controller-0.storagemgmt
+172.17.1.78 controller-0.internalapi.redhat.local controller-0.internalapi
+172.17.2.25 controller-0.tenant.redhat.local controller-0.tenant
+10.0.0.127 controller-0.external.redhat.local controller-0.external
+192.168.24.34 controller-0.ctlplane.redhat.local controller-0.ctlplane
+172.17.1.18 controller-1.redhat.local controller-1
+172.17.3.25 controller-1.storage.redhat.local controller-1.storage
+172.17.4.103 controller-1.storagemgmt.redhat.local controller-1.storagemgmt
+172.17.1.18 controller-1.internalapi.redhat.local controller-1.internalapi
+172.17.2.67 controller-1.tenant.redhat.local controller-1.tenant
+10.0.0.129 controller-1.external.redhat.local controller-1.external
+192.168.24.14 controller-1.ctlplane.redhat.local controller-1.ctlplane
+172.17.1.68 controller-2.redhat.local controller-2
+172.17.3.81 controller-2.storage.redhat.local controller-2.storage
+172.17.4.76 controller-2.storagemgmt.redhat.local controller-2.storagemgmt
+172.17.1.68 controller-2.internalapi.redhat.local controller-2.internalapi
+172.17.2.47 controller-2.tenant.redhat.local controller-2.tenant
+10.0.0.106 controller-2.external.redhat.local controller-2.external
+192.168.24.11 controller-2.ctlplane.redhat.local controller-2.ctlplane
+
+
 # END ANSIBLE MANAGED BLOCK
 127.0.0.1   undercloud-0.redhat.local undercloud-0
 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4


(undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
192.168.24.1 undercloud-0.redhat.local undercloud-0
192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane

# END ANSIBLE MANAGED BLOCK
127.0.0.1   undercloud-0.redhat.local undercloud-0
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
~~~

Comment 1 Brendan Shephard 2021-04-13 11:41:35 UTC
I assume this is because we use a different Ansible inventory file between undercloud install and overcloud deployments. So when it runs this playbook during the undercloud install, the hosts entries are replaced:
https://github.com/openstack/tripleo-ansible/blob/stable/train/tripleo_ansible/roles/tripleo-hosts-entries/tasks/main.yml#L38-L90

Comment 5 Brendan Shephard 2021-04-14 09:35:20 UTC
Just making this public for anyone else who might run into this and wants their hosts entries back. You can just run a normal overcloud deploy, which will bring them back. Or, if you don't want to do that, it's possible to just use the tripleo-hosts-entries Ansible role like so:

Playbook:
[root@instack stack]# cat populate_hosts.yaml 
- hosts: undercloud
  roles:
      - role: 'tripleo-hosts-entries'

  vars_files:
      - "/var/lib/mistral/config-download-latest/global_vars.yaml"


Command:
ansible-playbook -i /var/lib/mistral/config-download-latest/tripleo-ansible-inventory.yaml populate_hosts.yaml



outcome:
TASK [tripleo-hosts-entries : Prepare new /etc/hosts] ****************************************************************************************************************************************************************************************
changed: [undercloud]

TASK [tripleo-hosts-entries : Update /etc/hosts contents (if changed)] ***********************************************************************************************************************************************************************
changed: [undercloud]

TASK [tripleo-hosts-entries : Clean up temporary hosts file] *********************************************************************************************************************************************************************************
changed: [undercloud]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
undercloud                 : ok=8    changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

[root@instack stack]# head -n 10 /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
172.24.10.71 overcloud-novacompute-0.localdomain overcloud-novacompute-0
172.24.20.131 overcloud-novacompute-0.storage.localdomain overcloud-novacompute-0.storage
172.24.10.71 overcloud-novacompute-0.internalapi.localdomain overcloud-novacompute-0.internalapi
172.24.40.182 overcloud-novacompute-0.tenant.localdomain overcloud-novacompute-0.tenant
192.168.24.6 overcloud-novacompute-0.ctlplane.localdomain overcloud-novacompute-0.ctlplane
172.24.10.158 overcloud-novacompute-1.localdomain overcloud-novacompute-1
172.24.20.176 overcloud-novacompute-1.storage.localdomain overcloud-novacompute-1.storage
172.24.10.158 overcloud-novacompute-1.internalapi.localdomain overcloud-novacompute-1.internalapi
172.24.40.180 overcloud-novacompute-1.tenant.localdomain overcloud-novacompute-1.tenant

Comment 9 Brendan Shephard 2021-04-23 13:29:06 UTC
Hi,

These fixes are now merged upstream to address this issue:
https://review.opendev.org/c/openstack/tripleo-ansible/+/786623
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/787327

I'll start working on backporting them to Train to resolve this issue moving forward. There is some slight differences with OSP16 that I will need to test, but it shouldn't be an issue.

I'll let you know how it goes.

Comment 10 Masayuki Igawa 2021-04-25 23:18:17 UTC
Hi Brendan,

Thank you very much for taking care of this and the updates!

Best Regards,
-- Masayuki

Comment 23 Alex Stupnikov 2021-08-27 09:16:32 UTC
Hello.

I have found this bug while troubleshooting slightly different issue for our customer running RHOSP 16.1.6: one of the entries for undercloud [1] host was removed likely during overcloud deployment. As a result rabbitmq failure was triggered. Issue was resolved when customer added this entry manually and restarted affected service.

I am kindly looking for a confirmation that this bug is a good much for customer's issue. Is this bug a match or we need to report a new one?

[1]
1.1.1.1 dirhost.example.com dirhost

Comment 24 Brendan Shephard 2021-08-27 09:26:46 UTC
Hey,

So for this one. It was only the hosts entries within the # BEGIN ANSIBLE MANAGED BLOCK that were changed during overcloud and undercloud deployments.

If the host entry you're referring to was outside of that block and removed, then it would be a different issue. 

Hope that helps to clarify.

Comment 25 Alex Stupnikov 2021-08-27 10:36:48 UTC
Thank you Brandan. Indeed, after taking a second look it seems that this bug is not good match and we are likely talking about some rabbitmq-specific problem.

At the same time there is a possible problem I have found in fix for this bug: it adds and entry like [1] for undercloud which wasn't added before. As a result, we would have two different entries for the undercloud (one points to 127.0.0.1 as before and new one points to non-loopback IP). Not sure if this is fine or not because in RHOSP 16.2 I can see the same behavior.

Thanks again, Alex.

[1]
192.168.24.1 undercloud-0.redhat.local undercloud-0

Comment 37 errata-xmlrpc 2021-12-09 20:18:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762