Bug 1337465 - Overcloud nodes /etc/hosts file contains entry pointing to the loopback address for the nodes hostname
Summary: Overcloud nodes /etc/hosts file contains entry pointing to the loopback addre...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director-images
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Thierry Vignaud
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-19 09:24 UTC by Marius Cornea
Modified: 2016-08-18 22:58 UTC (History)
11 users (show)

Fixed In Version: rhosp-director-images-9.0-20160526.1.el7ost
Doc Type: Bug Fix
Doc Text:
This update resolves an issue whereby the image building process was missing DIB_CLOUD_INIT_ETC_HOSTS=false, which would result in the '/etc/hosts' file containing entries for the node names and FQDN pointing to the loopback address, causing the cluster to be unable to start.
Clone Of:
Environment:
Last Closed: 2016-08-18 22:58:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1598 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 images Release Candidate Advisory 2016-08-19 02:57:50 UTC

Description Marius Cornea 2016-05-19 09:24:30 UTC
Description of problem:
The overcloud nodes /etc/hosts file contains entry pointing to 127.0.0.1 for the nodes hostname:

127.0.0.1      overcloud-controller-0.localdomain      overcloud-controller-0
192.0.2.22     overcloud-controller-0.localdomain      overcloud-controller-0

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-2.0.0-5.el7ost.noarch

How reproducible:
100%

Expected results:
There's no entry pointing to 127.0.0.1. This makes corosync unable to join the node to cluster:

systemd[1]: Starting Corosync Cluster Engine...
corosync[14022]:  [TOTEM ] Initializing transport (UDP/IP Unicast).                                       
corosync[14022]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none           
corosync[14022]:  [TOTEM ] The network interface [127.0.0.1] is now up.                                   
corosync[14022]:  [SERV  ] Service engine loaded: corosync configuration map access [0]                   
corosync[14022]:  [QB    ] server name: cmap                                                              
corosync[14022]:  [SERV  ] Service engine loaded: corosync configuration service [1]                      
corosync[14022]:  [QB    ] server name: cfg                                                               
corosync[14022]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2] 
corosync[14022]:  [QB    ] server name: cpg                                                               
corosync[14022]:  [SERV  ] Service engine loaded: corosync profile loading service [4]                    
corosync[14022]:  [QUORUM] Using quorum provider corosync_votequorum                                      
corosync[14022]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]                   
corosync[14022]:  [QB    ] server name: votequorum                                                        
corosync[14022]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]                
corosync[14022]:  [QB    ] server name: quorum                                                            
corosync[14022]:  [TOTEM ] adding new UDPU member {127.0.0.1}                                             
corosync[14022]:  [TOTEM ] adding new UDPU member {192.0.2.19}                                            
corosync[14022]:  [TOTEM ] adding new UDPU member {192.0.2.23}                                            
corosync[14022]:  [TOTEM ] A new membership (127.0.0.1:8) was formed. Members joined: 1                   
corosync[14022]:  [QUORUM] Members[1]: 1                                                                  
corosync[14022]:  [MAIN  ] Completed service synchronization, ready to provide service.

Comment 2 Omri Hochman 2016-05-20 14:40:00 UTC
Deployment failed with RHEL-OSP director 9.0 puddle - 2016-05-17.1

After applying the workaround in : https://bugzilla.redhat.com/show_bug.cgi?id=1337537

This issue #1337465 is blocking the deployment.

Comment 3 Marius Cornea 2016-05-23 08:03:07 UTC
A dirty workaround: during early stage of the deployment(right after nodes boot) ssh to each controller nodes and comment out in /etc/hosts the entry pointing to the loopback address.

Comment 4 Marius Cornea 2016-05-23 09:10:22 UTC
Checking further this issue I think it's caused by cloud-init:

[root@overcloud-controller-0 ~]# grep hosts /var/lib/cloud/instances/9c06be1a-ebd7-4280-970f-907085714f01/obj.pkl
aS'update_etc_hosts'
asS'manage_etc_hosts'

comparing it to a node on a rhos8 deployment where:

[root@overcloud-controller-0 heat-admin]# grep hosts /var/lib/cloud/instances/85ba7bc7-04ab-4e83-b3e5-2dcdff7974f7/obj.pkl 
aS'update_etc_hosts'

Checking the hosts template, it  actually matches the format of the /etc/hosts file:

[root@overcloud-controller-0 ~]# cat /etc/cloud/templates/hosts.redhat.tmpl 
## template:jinja
{#
This file /etc/cloud/templates/hosts.redhat.tmpl is only utilized
if enabled in cloud-config.  Specifically, in order to enable it
you need to add the following to config:
  manage_etc_hosts: True
-#}
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.redhat.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
# 
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 {{fqdn}} {{hostname}}
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4

# The following lines are desirable for IPv6 capable hosts
::1 {{fqdn}} {{hostname}}
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6


# HEAT_HOSTS_START - Do not edit manually within this section!
10.0.0.12 overcloud-novacompute-0.localdomain overcloud-novacompute-0
192.168.0.17 overcloud-novacompute-0-external
10.0.0.12 overcloud-novacompute-0-internalapi
10.0.0.140 overcloud-novacompute-0-storage
192.168.0.17 overcloud-novacompute-0-storagemgmt
10.0.1.138 overcloud-novacompute-0-tenant
172.16.17.161 overcloud-novacompute-0-management

10.0.0.15 overcloud-controller-0.localdomain overcloud-controller-0
172.16.18.28 overcloud-controller-0-external
10.0.0.15 overcloud-controller-0-internalapi
10.0.0.143 overcloud-controller-0-storage
10.0.1.14 overcloud-controller-0-storagemgmt
10.0.1.141 overcloud-controller-0-tenant
172.16.17.164 overcloud-controller-0-management

10.0.0.13 overcloud-controller-1.localdomain overcloud-controller-1
172.16.18.26 overcloud-controller-1-external
10.0.0.13 overcloud-controller-1-internalapi
10.0.0.142 overcloud-controller-1-storage
10.0.1.12 overcloud-controller-1-storagemgmt
10.0.1.139 overcloud-controller-1-tenant
172.16.17.163 overcloud-controller-1-management

10.0.0.14 overcloud-controller-2.localdomain overcloud-controller-2
172.16.18.27 overcloud-controller-2-external
10.0.0.14 overcloud-controller-2-internalapi
10.0.0.141 overcloud-controller-2-storage
10.0.1.13 overcloud-controller-2-storagemgmt
10.0.1.140 overcloud-controller-2-tenant
172.16.17.162 overcloud-controller-2-management



10.0.0.139 overcloud-cephstorage-0.localdomain overcloud-cephstorage-0
192.168.0.16 overcloud-cephstorage-0-external
192.168.0.16 overcloud-cephstorage-0-internalapi
10.0.0.139 overcloud-cephstorage-0-storage
10.0.1.11 overcloud-cephstorage-0-storagemgmt
192.168.0.16 overcloud-cephstorage-0-tenant
172.16.17.160 overcloud-cephstorage-0-management
# HEAT_HOSTS_END

Comment 5 Marius Cornea 2016-05-23 10:17:26 UTC
I checked further with Giulio and this is caused by:

/etc/cloud/cloud.cfg.d/10_etc_hosts.cfg 
manage_etc_hosts: localhost

which comes prepackages inside the image.

This can be controller by DIB_CLOUD_INIT_ETC_HOSTS="" or DIB_CLOUD_INIT_ETC_HOSTS=false before building the images.                  

This was also added to the tripleoclient in https://review.openstack.org/#/c/222539/

We need to figure out if the variable was set during the image build process.

Comment 6 Thierry Vignaud 2016-05-23 13:53:21 UTC
DIB_CLOUD_INIT_ETC_HOSTS is not set as part of the image building process (KS, ...)

Comment 7 Thierry Vignaud 2016-05-24 09:03:51 UTC
A test puddle (2016-05-23.1) with an image built with DIB_CLOUD_INIT_ETC_HOSTS=false has been provided to Omri Hochman yesterday.

Comment 8 Thierry Vignaud 2016-05-25 10:32:59 UTC
Omri, did you get any success with the images I provided you?

Comment 9 Omri Hochman 2016-05-27 18:38:25 UTC
I've managed to pass deployment with the latest images, and the problem didn't reproduce,  we need pm_ack, so we can switch this BZ to ON_QA and then to Verified.

Comment 11 Omri Hochman 2016-05-27 19:28:39 UTC
unable to reproduce with latest images  : New RHEL-OSP director 9.0 puddle - 2016-05-26.1

Comment 13 errata-xmlrpc 2016-08-18 22:58:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1598.html


Note You need to log in before you can comment on or make changes to this bug.