Bug 1933528 - Overcloud host entries of /etc/hosts on the undercloud were removed after re-executing openstack undercloud install
Summary: Overcloud host entries of /etc/hosts on the undercloud were removed after re-...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: Brendan Shephard
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-01 00:11 UTC by Masayuki Igawa
Modified: 2021-10-21 19:35 UTC (History)
11 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20210520163301.902c3c8.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5966711 0 None None None 2021-04-14 22:45:14 UTC

Description Masayuki Igawa 2021-03-01 00:11:37 UTC
Description of problem:

Overcloud host entries of /etc/hosts on the undercloud were removed after re-executing `openstack undercloud install` like the following.
~~~
(undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
192.168.24.1 undercloud-0.redhat.local undercloud-0
192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane

# END ANSIBLE MANAGED BLOCK
127.0.0.1   undercloud-0.redhat.local undercloud-0
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
~~~
All the overcloud host entries were deleted.


Version-Release number of selected component (if applicable):
 - python3-tripleoclient-12.3.2-1.20200914164930.el8ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. Execute openstack undercloud install
2. Execute `openstack undercloud install` again

Actual results:

Overcloud host entries of /etc/hosts file on the undercloud node are deleted.

Expected results:

Overcloud host entries of /etc/hosts file on the undercloud node are remaining after executing `openstack undercloud install`.


Additional info:

I added some overcloud host entries and executed `openstack undercloud install` again. And it can be reproduced.
Here's the diff between the /etc/hosts.
~~~
(undercloud) [stack@undercloud-0 ~]$ diff -Npu /etc/hosts hosts.backup-20210225-1300|less
--- /etc/hosts  2021-02-25 04:06:04.187073777 +0000
+++ hosts.backup-20210225-1300  2021-02-25 04:00:44.986120333 +0000
@@ -3,6 +3,29 @@
 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane
 
+172.17.1.78 controller-0.redhat.local controller-0
+172.17.3.135 controller-0.storage.redhat.local controller-0.storage
+172.17.4.126 controller-0.storagemgmt.redhat.local controller-0.storagemgmt
+172.17.1.78 controller-0.internalapi.redhat.local controller-0.internalapi
+172.17.2.25 controller-0.tenant.redhat.local controller-0.tenant
+10.0.0.127 controller-0.external.redhat.local controller-0.external
+192.168.24.34 controller-0.ctlplane.redhat.local controller-0.ctlplane
+172.17.1.18 controller-1.redhat.local controller-1
+172.17.3.25 controller-1.storage.redhat.local controller-1.storage
+172.17.4.103 controller-1.storagemgmt.redhat.local controller-1.storagemgmt
+172.17.1.18 controller-1.internalapi.redhat.local controller-1.internalapi
+172.17.2.67 controller-1.tenant.redhat.local controller-1.tenant
+10.0.0.129 controller-1.external.redhat.local controller-1.external
+192.168.24.14 controller-1.ctlplane.redhat.local controller-1.ctlplane
+172.17.1.68 controller-2.redhat.local controller-2
+172.17.3.81 controller-2.storage.redhat.local controller-2.storage
+172.17.4.76 controller-2.storagemgmt.redhat.local controller-2.storagemgmt
+172.17.1.68 controller-2.internalapi.redhat.local controller-2.internalapi
+172.17.2.47 controller-2.tenant.redhat.local controller-2.tenant
+10.0.0.106 controller-2.external.redhat.local controller-2.external
+192.168.24.11 controller-2.ctlplane.redhat.local controller-2.ctlplane
+
+
 # END ANSIBLE MANAGED BLOCK
 127.0.0.1   undercloud-0.redhat.local undercloud-0
 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4


(undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
192.168.24.1 undercloud-0.redhat.local undercloud-0
192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external
192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane

# END ANSIBLE MANAGED BLOCK
127.0.0.1   undercloud-0.redhat.local undercloud-0
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
~~~

Comment 1 Brendan Shephard 2021-04-13 11:41:35 UTC
I assume this is because we use a different Ansible inventory file between undercloud install and overcloud deployments. So when it runs this playbook during the undercloud install, the hosts entries are replaced:
https://github.com/openstack/tripleo-ansible/blob/stable/train/tripleo_ansible/roles/tripleo-hosts-entries/tasks/main.yml#L38-L90

Comment 5 Brendan Shephard 2021-04-14 09:35:20 UTC
Just making this public for anyone else who might run into this and wants their hosts entries back. You can just run a normal overcloud deploy, which will bring them back. Or, if you don't want to do that, it's possible to just use the tripleo-hosts-entries Ansible role like so:

Playbook:
[root@instack stack]# cat populate_hosts.yaml 
- hosts: undercloud
  roles:
      - role: 'tripleo-hosts-entries'

  vars_files:
      - "/var/lib/mistral/config-download-latest/global_vars.yaml"


Command:
ansible-playbook -i /var/lib/mistral/config-download-latest/tripleo-ansible-inventory.yaml populate_hosts.yaml



outcome:
TASK [tripleo-hosts-entries : Prepare new /etc/hosts] ****************************************************************************************************************************************************************************************
changed: [undercloud]

TASK [tripleo-hosts-entries : Update /etc/hosts contents (if changed)] ***********************************************************************************************************************************************************************
changed: [undercloud]

TASK [tripleo-hosts-entries : Clean up temporary hosts file] *********************************************************************************************************************************************************************************
changed: [undercloud]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
undercloud                 : ok=8    changed=5    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

[root@instack stack]# head -n 10 /etc/hosts
# BEGIN ANSIBLE MANAGED BLOCK
172.24.10.71 overcloud-novacompute-0.localdomain overcloud-novacompute-0
172.24.20.131 overcloud-novacompute-0.storage.localdomain overcloud-novacompute-0.storage
172.24.10.71 overcloud-novacompute-0.internalapi.localdomain overcloud-novacompute-0.internalapi
172.24.40.182 overcloud-novacompute-0.tenant.localdomain overcloud-novacompute-0.tenant
192.168.24.6 overcloud-novacompute-0.ctlplane.localdomain overcloud-novacompute-0.ctlplane
172.24.10.158 overcloud-novacompute-1.localdomain overcloud-novacompute-1
172.24.20.176 overcloud-novacompute-1.storage.localdomain overcloud-novacompute-1.storage
172.24.10.158 overcloud-novacompute-1.internalapi.localdomain overcloud-novacompute-1.internalapi
172.24.40.180 overcloud-novacompute-1.tenant.localdomain overcloud-novacompute-1.tenant

Comment 9 Brendan Shephard 2021-04-23 13:29:06 UTC
Hi,

These fixes are now merged upstream to address this issue:
https://review.opendev.org/c/openstack/tripleo-ansible/+/786623
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/787327

I'll start working on backporting them to Train to resolve this issue moving forward. There is some slight differences with OSP16 that I will need to test, but it shouldn't be an issue.

I'll let you know how it goes.

Comment 10 Masayuki Igawa 2021-04-25 23:18:17 UTC
Hi Brendan,

Thank you very much for taking care of this and the updates!

Best Regards,
-- Masayuki

Comment 23 Alex Stupnikov 2021-08-27 09:16:32 UTC
Hello.

I have found this bug while troubleshooting slightly different issue for our customer running RHOSP 16.1.6: one of the entries for undercloud [1] host was removed likely during overcloud deployment. As a result rabbitmq failure was triggered. Issue was resolved when customer added this entry manually and restarted affected service.

I am kindly looking for a confirmation that this bug is a good much for customer's issue. Is this bug a match or we need to report a new one?

[1]
1.1.1.1 dirhost.example.com dirhost

Comment 24 Brendan Shephard 2021-08-27 09:26:46 UTC
Hey,

So for this one. It was only the hosts entries within the # BEGIN ANSIBLE MANAGED BLOCK that were changed during overcloud and undercloud deployments.

If the host entry you're referring to was outside of that block and removed, then it would be a different issue. 

Hope that helps to clarify.

Comment 25 Alex Stupnikov 2021-08-27 10:36:48 UTC
Thank you Brandan. Indeed, after taking a second look it seems that this bug is not good match and we are likely talking about some rabbitmq-specific problem.

At the same time there is a possible problem I have found in fix for this bug: it adds and entry like [1] for undercloud which wasn't added before. As a result, we would have two different entries for the undercloud (one points to 127.0.0.1 as before and new one points to non-loopback IP). Not sure if this is fine or not because in RHOSP 16.2 I can see the same behavior.

Thanks again, Alex.

[1]
192.168.24.1 undercloud-0.redhat.local undercloud-0


Note You need to log in before you can comment on or make changes to this bug.