1692662 – RHHI Gluster deployment fails for static hostnames

Bug 1692662 - RHHI Gluster deployment fails for static hostnames

Summary: RHHI Gluster deployment fails for static hostnames

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-ansible
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Async Update
Assignee:	Sachidananda Urs
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1692671
TreeView+	depends on / blocked

Reported:	2019-03-26 07:49 UTC by SATHEESARAN
Modified:	2019-10-03 07:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:	gluster-ansible-features-1.0.5-1
Doc Type:	Bug Fix
Doc Text:	During deployment, hostnames were previously verified using the dig command. This command did not validate hostnames specified in the /etc/hosts file, so deployment failed. Hostnames are now validated using the getent command, which is able to validate hostnames set in the /etc/hosts file.
Clone Of:
Clones:	1692671 (view as bug list)
Environment:
Last Closed:	2019-10-03 07:58:12 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2557	0	None	None	None	2019-10-03 07:58:34 UTC

Description SATHEESARAN 2019-03-26 07:49:44 UTC

Description of problem:
-----------------------
There are requirements, when the user environment may have static IP configuration, and hostnames are just added to /etc/hosts file locally on each hosts. In this case, its expected that the RHHI-V deployment should work good,

But with the deployment checks introduced with gluster-ansible, now its using 'dig' command to validate the FQDNs and if the hostname is available locally in /etc/hosts and that's not accepted

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gluster-ansible-repositories-1.0-1.el7rhgs.noarch
gluster-ansible-maintenance-1.0.1-1.el7rhgs.noarch
gluster-ansible-features-1.0.4-5.el7rhgs.noarch
gluster-ansible-cluster-1.0-1.el7rhgs.noarch
gluster-ansible-roles-1.0.4-4.el7rhgs.noarch
gluster-ansible-infra-1.0.3-3.el7rhgs.noarch

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Use the static hostnames in /etc/hosts
2. Use these hostnames for RHHI-V gluster deployment

Actual results:
---------------
Gluster deployment fails

Expected results:
----------------
Gluster deployment should succeed as the hostnames are available in /etc/hosts

Comment 2 SATHEESARAN 2019-03-26 08:09:28 UTC

Error message from the output console:

Content of /etc/hosts file
---------------------------
[root@rhhihost1 ~]# cat /etc/hosts
10.70.37.83 rhhihost1.lab.eng.blr.redhat.com
10.70.37.218 rhhihost2.lab.eng.blr.redhat.com
10.70.37.217 rhhihost3.lab.eng.blr.redhat.com
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Check whether all hosts are reachable with this hostname
---------------------------------------------------------
[root@rhhihost1 ~]# ping -c2 rhhihost1.lab.eng.blr.redhat.com
PING rhhihost1.lab.eng.blr.redhat.com (10.70.37.83) 56(84) bytes of data.
64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=1 ttl=64 time=0.041 ms
64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=2 ttl=64 time=0.030 ms

--- rhhihost1.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.030/0.035/0.041/0.008 ms
[root@rhhihost1 ~]# ping -c2 rhhihost2.lab.eng.blr.redhat.com
PING rhhihost2.lab.eng.blr.redhat.com (10.70.37.218) 56(84) bytes of data.
64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=1 ttl=64 time=0.357 ms
64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=2 ttl=64 time=0.372 ms

--- rhhihost2.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.357/0.364/0.372/0.020 ms
[root@rhhihost1 ~]# ping -c2 rhhihost3.lab.eng.blr.redhat.com
PING rhhihost3.lab.eng.blr.redhat.com (10.70.37.217) 56(84) bytes of data.
64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=1 ttl=64 time=1.09 ms
64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=2 ttl=64 time=0.309 ms

--- rhhihost3.lab.eng.blr.redhat.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.309/0.699/1.090/0.391 ms
[root@rhhihost1 ~]# 

dig command usage on these hostnames
--------------------------------------
[root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short
[root@rhhihost1 ~]# echo $?
0


Error message on the console
-----------------------------
<snip>
TASK [gluster.features/roles/gluster_hci : Check if valid FQDN is provided] ****
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost3.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost3.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.048777", "end": "2019-03-26 13:01:53.083083", "failed_when_result": true, "item": "rhhihost3.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.034306", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost1.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost1.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.045785", "end": "2019-03-26 13:01:53.388902", "failed_when_result": true, "item": "rhhihost1.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.343117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost2.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost2.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.046347", "end": "2019-03-26 13:01:53.690238", "failed_when_result": true, "item": "rhhihost2.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.643891", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
</snip>

Comment 3 SATHEESARAN 2019-03-26 08:11:02 UTC

I feel 'getent' could be used instead of 'dig' to validate the resolvable FQDNs.
I have seen that 'getent' was used by ovirt for FQDN checks.


[root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short
[root@rhhihost1 ~]# echo $?
0
[root@rhhihost1 ~]# getent ahosts rhhihost1.lab.eng.blr.redhat.com
10.70.37.83     STREAM rhhihost1.lab.eng.blr.redhat.com
10.70.37.83     DGRAM  
10.70.37.83     RAW

Comment 4 Sachidananda Urs 2019-03-26 08:46:11 UTC

We cannot use getent to check if a given host is FQDN or not. For example, if you have some random name in your /etc/hosts for example:

cat /etc/hosts:

10.70.42.133 foonode1

getent ahosts foonode1
10.70.42.133    STREAM foonode1
10.70.42.133    DGRAM  
10.70.42.133    RAW    

Even though foonode1 is not FQDN, I see similar results. And there is no way I could make out if it is a FQDN.

Comment 5 SATHEESARAN 2019-03-26 09:48:40 UTC

(In reply to Sachidananda Urs from comment #4)
> We cannot use getent to check if a given host is FQDN or not. For example,
> if you have some random name in your /etc/hosts for example:
> 
> cat /etc/hosts:
> 
> 10.70.42.133 foonode1
> 
> getent ahosts foonode1
> 10.70.42.133    STREAM foonode1
> 10.70.42.133    DGRAM  
> 10.70.42.133    RAW    
> 
> Even though foonode1 is not FQDN, I see similar results. And there is no way
> I could make out if it is a FQDN.

This is just the thought I had about the usage of 'getent' which I could observed from
ovirt way of validating the hostnames. But the crux of the problem is that RHHI-V deployment,
couldn't proceed with static hostnames/FQDNs , with no DNS entries.

Comment 6 Sachidananda Urs 2019-03-26 09:52:18 UTC

> This is just the thought I had about the usage of 'getent' which I could
> observed from
> ovirt way of validating the hostnames. But the crux of the problem is that
> RHHI-V deployment,
> couldn't proceed with static hostnames/FQDNs , with no DNS entries.

If we have to use static hostnames, we can disable the FQDN check in gluster-ansible.
Since using getent does not validate if the given hostname is FQDN. It just gives the ip address of the given hostname.

Comment 8 Sachidananda Urs 2019-04-04 14:12:10 UTC

PR: https://github.com/gluster/gluster-ansible-features/pull/24

Comment 10 SATHEESARAN 2019-06-26 07:12:01 UTC

Tested with RHVH 4.3.5 + RHEL 7.7 + RHGS 3.4.4 ( interim build - glusterfs-6.0-6 ) with ansible 2.8.1-1
with:
gluster-ansible-features-1.0.5-2.el7rhgs.noarch
gluster-ansible-roles-1.0.5-2.el7rhgs.noarch
gluster-ansible-infra-1.0.4-3.el7rhgs.noarch

static hostnames in the /etc/hosts file are now valid hostnames and deployment proceeds with this hostname in place

Comment 13 errata-xmlrpc 2019-10-03 07:58:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2557

Note You need to log in before you can comment on or make changes to this bug.