Bug 1692671 - RHHI Gluster deployment fails for static hostnames
Summary: RHHI Gluster deployment fails for static hostnames
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhhiv-1.6
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: RHHI-V 1.6.z Async Update
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1692662
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-26 08:12 UTC by SATHEESARAN
Modified: 2019-10-03 12:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
During deployment, hostnames were previously verified using the dig command. This command did not validate hostnames specified in the /etc/hosts file, so deployment failed. Hostnames are now validated using the getent command, which is able to validate hostnames set in the /etc/hosts file.
Clone Of: 1692662
Environment:
Last Closed: 2019-10-03 12:23:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2963 0 None None None 2019-10-03 12:24:06 UTC

Description SATHEESARAN 2019-03-26 08:12:28 UTC
Description of problem:
-----------------------
There are requirements, when the user environment may have static IP configuration, and hostnames are just added to /etc/hosts file locally on each hosts. In this case, its expected that the RHHI-V deployment should work good,

But with the deployment checks introduced with gluster-ansible, now its using 'dig' command to validate the FQDNs and if the hostname is available locally in /etc/hosts and that's not accepted

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gluster-ansible-repositories-1.0-1.el7rhgs.noarch
gluster-ansible-maintenance-1.0.1-1.el7rhgs.noarch
gluster-ansible-features-1.0.4-5.el7rhgs.noarch
gluster-ansible-cluster-1.0-1.el7rhgs.noarch
gluster-ansible-roles-1.0.4-4.el7rhgs.noarch
gluster-ansible-infra-1.0.3-3.el7rhgs.noarch

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Use the static hostnames in /etc/hosts
2. Use these hostnames for RHHI-V gluster deployment

Actual results:
---------------
Gluster deployment fails

Expected results:
----------------
Gluster deployment should succeed as the hostnames are available in /etc/hosts

Comment 1 SATHEESARAN 2019-03-26 08:12:45 UTC
Content of /etc/hosts file
---------------------------
[root@rhhihost1 ~]# cat /etc/hosts
10.70.37.83 rhhihost1.lab.eng.blr.redhat.com
10.70.37.218 rhhihost2.lab.eng.blr.redhat.com
10.70.37.217 rhhihost3.lab.eng.blr.redhat.com
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Check whether all hosts are reachable with this hostname
---------------------------------------------------------
[root@rhhihost1 ~]# ping -c2 rhhihost1.lab.eng.blr.redhat.com
PING rhhihost1.lab.eng.blr.redhat.com (10.70.37.83) 56(84) bytes of data.
64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=1 ttl=64 time=0.041 ms
64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=2 ttl=64 time=0.030 ms

--- rhhihost1.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.030/0.035/0.041/0.008 ms
[root@rhhihost1 ~]# ping -c2 rhhihost2.lab.eng.blr.redhat.com
PING rhhihost2.lab.eng.blr.redhat.com (10.70.37.218) 56(84) bytes of data.
64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=1 ttl=64 time=0.357 ms
64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=2 ttl=64 time=0.372 ms

--- rhhihost2.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.357/0.364/0.372/0.020 ms
[root@rhhihost1 ~]# ping -c2 rhhihost3.lab.eng.blr.redhat.com
PING rhhihost3.lab.eng.blr.redhat.com (10.70.37.217) 56(84) bytes of data.
64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=1 ttl=64 time=1.09 ms
64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=2 ttl=64 time=0.309 ms

--- rhhihost3.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.309/0.699/1.090/0.391 ms
[root@rhhihost1 ~]# 

dig command usage on these hostnames
--------------------------------------
[root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short
[root@rhhihost1 ~]# echo $?
0


Error message on the console
-----------------------------
<snip>
TASK [gluster.features/roles/gluster_hci : Check if valid FQDN is provided] ****
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost3.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost3.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.048777", "end": "2019-03-26 13:01:53.083083", "failed_when_result": true, "item": "rhhihost3.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.034306", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost1.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost1.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.045785", "end": "2019-03-26 13:01:53.388902", "failed_when_result": true, "item": "rhhihost1.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.343117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost2.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost2.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.046347", "end": "2019-03-26 13:01:53.690238", "failed_when_result": true, "item": "rhhihost2.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.643891", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
</snip>


I feel 'getent' could be used instead of 'dig' to validate the resolvable FQDNs.
I have seen that 'getent' was used by ovirt for FQDN checks.


[root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short
[root@rhhihost1 ~]# echo $?
0
[root@rhhihost1 ~]# getent ahosts rhhihost1.lab.eng.blr.redhat.com
10.70.37.83     STREAM rhhihost1.lab.eng.blr.redhat.com
10.70.37.83     DGRAM  
10.70.37.83     RAW

Comment 2 SATHEESARAN 2019-03-26 08:17:08 UTC
Marking this bug for RHHI-V 1.6 known_issue.

If the user had to use hostnames from /etc/hosts, then he need to add the 
param: 'gluster_features_fqdn_check: false' under each hosts in the generated ansible vars file.

Example:
<snip>
hosts:
    rhhihost1.example.com:
      gluster_features_fqdn_check: false  <----------------
      gluster_infra_volume_groups:
        - vgname: gluster_vg_sdb
          pvname: /dev/sdb
        - vgname: gluster_vg_sdc
          pvname: /dev/mapper/vdo_sdc
</snip>

Comment 4 Sahina Bose 2019-03-29 13:49:17 UTC
LGTM

Comment 7 SATHEESARAN 2019-06-26 07:13:19 UTC
Tested with RHVH 4.3.5 + RHEL 7.7 + RHGS 3.4.4 ( interim build - glusterfs-6.0-6 ) with ansible 2.8.1-1
with:
gluster-ansible-features-1.0.5-2.el7rhgs.noarch
gluster-ansible-roles-1.0.5-2.el7rhgs.noarch
gluster-ansible-infra-1.0.4-3.el7rhgs.noarch

static hostnames in the /etc/hosts file are now valid hostnames and deployment proceeds with this hostname in place

Comment 9 errata-xmlrpc 2019-10-03 12:23:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2963


Note You need to log in before you can comment on or make changes to this bug.