Description of problem: Overcloud host entries of /etc/hosts on the undercloud were removed after re-executing `openstack undercloud install` like the following. ~~~ (undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts # BEGIN ANSIBLE MANAGED BLOCK 192.168.24.1 undercloud-0.redhat.local undercloud-0 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane # END ANSIBLE MANAGED BLOCK 127.0.0.1 undercloud-0.redhat.local undercloud-0 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ~~~ All the overcloud host entries were deleted. Version-Release number of selected component (if applicable): - python3-tripleoclient-12.3.2-1.20200914164930.el8ost.noarch How reproducible: 100% Steps to Reproduce: 1. Execute openstack undercloud install 2. Execute `openstack undercloud install` again Actual results: Overcloud host entries of /etc/hosts file on the undercloud node are deleted. Expected results: Overcloud host entries of /etc/hosts file on the undercloud node are remaining after executing `openstack undercloud install`. Additional info: I added some overcloud host entries and executed `openstack undercloud install` again. And it can be reproduced. Here's the diff between the /etc/hosts. ~~~ (undercloud) [stack@undercloud-0 ~]$ diff -Npu /etc/hosts hosts.backup-20210225-1300|less --- /etc/hosts 2021-02-25 04:06:04.187073777 +0000 +++ hosts.backup-20210225-1300 2021-02-25 04:00:44.986120333 +0000 @@ -3,6 +3,29 @@ 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane +172.17.1.78 controller-0.redhat.local controller-0 +172.17.3.135 controller-0.storage.redhat.local controller-0.storage +172.17.4.126 controller-0.storagemgmt.redhat.local controller-0.storagemgmt +172.17.1.78 controller-0.internalapi.redhat.local controller-0.internalapi +172.17.2.25 controller-0.tenant.redhat.local controller-0.tenant +10.0.0.127 controller-0.external.redhat.local controller-0.external +192.168.24.34 controller-0.ctlplane.redhat.local controller-0.ctlplane +172.17.1.18 controller-1.redhat.local controller-1 +172.17.3.25 controller-1.storage.redhat.local controller-1.storage +172.17.4.103 controller-1.storagemgmt.redhat.local controller-1.storagemgmt +172.17.1.18 controller-1.internalapi.redhat.local controller-1.internalapi +172.17.2.67 controller-1.tenant.redhat.local controller-1.tenant +10.0.0.129 controller-1.external.redhat.local controller-1.external +192.168.24.14 controller-1.ctlplane.redhat.local controller-1.ctlplane +172.17.1.68 controller-2.redhat.local controller-2 +172.17.3.81 controller-2.storage.redhat.local controller-2.storage +172.17.4.76 controller-2.storagemgmt.redhat.local controller-2.storagemgmt +172.17.1.68 controller-2.internalapi.redhat.local controller-2.internalapi +172.17.2.47 controller-2.tenant.redhat.local controller-2.tenant +10.0.0.106 controller-2.external.redhat.local controller-2.external +192.168.24.11 controller-2.ctlplane.redhat.local controller-2.ctlplane + + # END ANSIBLE MANAGED BLOCK 127.0.0.1 undercloud-0.redhat.local undercloud-0 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 (undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts # BEGIN ANSIBLE MANAGED BLOCK 192.168.24.1 undercloud-0.redhat.local undercloud-0 192.168.24.1 undercloud-0.external.redhat.local undercloud-0.external 192.168.24.1 undercloud-0.ctlplane.redhat.local undercloud-0.ctlplane # END ANSIBLE MANAGED BLOCK 127.0.0.1 undercloud-0.redhat.local undercloud-0 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ~~~
I assume this is because we use a different Ansible inventory file between undercloud install and overcloud deployments. So when it runs this playbook during the undercloud install, the hosts entries are replaced: https://github.com/openstack/tripleo-ansible/blob/stable/train/tripleo_ansible/roles/tripleo-hosts-entries/tasks/main.yml#L38-L90
Just making this public for anyone else who might run into this and wants their hosts entries back. You can just run a normal overcloud deploy, which will bring them back. Or, if you don't want to do that, it's possible to just use the tripleo-hosts-entries Ansible role like so: Playbook: [root@instack stack]# cat populate_hosts.yaml - hosts: undercloud roles: - role: 'tripleo-hosts-entries' vars_files: - "/var/lib/mistral/config-download-latest/global_vars.yaml" Command: ansible-playbook -i /var/lib/mistral/config-download-latest/tripleo-ansible-inventory.yaml populate_hosts.yaml outcome: TASK [tripleo-hosts-entries : Prepare new /etc/hosts] **************************************************************************************************************************************************************************************** changed: [undercloud] TASK [tripleo-hosts-entries : Update /etc/hosts contents (if changed)] *********************************************************************************************************************************************************************** changed: [undercloud] TASK [tripleo-hosts-entries : Clean up temporary hosts file] ********************************************************************************************************************************************************************************* changed: [undercloud] PLAY RECAP *********************************************************************************************************************************************************************************************************************************** undercloud : ok=8 changed=5 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 [root@instack stack]# head -n 10 /etc/hosts # BEGIN ANSIBLE MANAGED BLOCK 172.24.10.71 overcloud-novacompute-0.localdomain overcloud-novacompute-0 172.24.20.131 overcloud-novacompute-0.storage.localdomain overcloud-novacompute-0.storage 172.24.10.71 overcloud-novacompute-0.internalapi.localdomain overcloud-novacompute-0.internalapi 172.24.40.182 overcloud-novacompute-0.tenant.localdomain overcloud-novacompute-0.tenant 192.168.24.6 overcloud-novacompute-0.ctlplane.localdomain overcloud-novacompute-0.ctlplane 172.24.10.158 overcloud-novacompute-1.localdomain overcloud-novacompute-1 172.24.20.176 overcloud-novacompute-1.storage.localdomain overcloud-novacompute-1.storage 172.24.10.158 overcloud-novacompute-1.internalapi.localdomain overcloud-novacompute-1.internalapi 172.24.40.180 overcloud-novacompute-1.tenant.localdomain overcloud-novacompute-1.tenant
Hi, These fixes are now merged upstream to address this issue: https://review.opendev.org/c/openstack/tripleo-ansible/+/786623 https://review.opendev.org/c/openstack/tripleo-heat-templates/+/787327 I'll start working on backporting them to Train to resolve this issue moving forward. There is some slight differences with OSP16 that I will need to test, but it shouldn't be an issue. I'll let you know how it goes.
Hi Brendan, Thank you very much for taking care of this and the updates! Best Regards, -- Masayuki
Hello. I have found this bug while troubleshooting slightly different issue for our customer running RHOSP 16.1.6: one of the entries for undercloud [1] host was removed likely during overcloud deployment. As a result rabbitmq failure was triggered. Issue was resolved when customer added this entry manually and restarted affected service. I am kindly looking for a confirmation that this bug is a good much for customer's issue. Is this bug a match or we need to report a new one? [1] 1.1.1.1 dirhost.example.com dirhost
Hey, So for this one. It was only the hosts entries within the # BEGIN ANSIBLE MANAGED BLOCK that were changed during overcloud and undercloud deployments. If the host entry you're referring to was outside of that block and removed, then it would be a different issue. Hope that helps to clarify.
Thank you Brandan. Indeed, after taking a second look it seems that this bug is not good match and we are likely talking about some rabbitmq-specific problem. At the same time there is a possible problem I have found in fix for this bug: it adds and entry like [1] for undercloud which wasn't added before. As a result, we would have two different entries for the undercloud (one points to 127.0.0.1 as before and new one points to non-loopback IP). Not sure if this is fine or not because in RHOSP 16.2 I can see the same behavior. Thanks again, Alex. [1] 192.168.24.1 undercloud-0.redhat.local undercloud-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762