Description of problem: ----------------------- There are requirements, when the user environment may have static IP configuration, and hostnames are just added to /etc/hosts file locally on each hosts. In this case, its expected that the RHHI-V deployment should work good, But with the deployment checks introduced with gluster-ansible, now its using 'dig' command to validate the FQDNs and if the hostname is available locally in /etc/hosts and that's not accepted Version-Release number of selected component (if applicable): ------------------------------------------------------------- gluster-ansible-repositories-1.0-1.el7rhgs.noarch gluster-ansible-maintenance-1.0.1-1.el7rhgs.noarch gluster-ansible-features-1.0.4-5.el7rhgs.noarch gluster-ansible-cluster-1.0-1.el7rhgs.noarch gluster-ansible-roles-1.0.4-4.el7rhgs.noarch gluster-ansible-infra-1.0.3-3.el7rhgs.noarch How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Use the static hostnames in /etc/hosts 2. Use these hostnames for RHHI-V gluster deployment Actual results: --------------- Gluster deployment fails Expected results: ---------------- Gluster deployment should succeed as the hostnames are available in /etc/hosts
Error message from the output console: Content of /etc/hosts file --------------------------- [root@rhhihost1 ~]# cat /etc/hosts 10.70.37.83 rhhihost1.lab.eng.blr.redhat.com 10.70.37.218 rhhihost2.lab.eng.blr.redhat.com 10.70.37.217 rhhihost3.lab.eng.blr.redhat.com 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Check whether all hosts are reachable with this hostname --------------------------------------------------------- [root@rhhihost1 ~]# ping -c2 rhhihost1.lab.eng.blr.redhat.com PING rhhihost1.lab.eng.blr.redhat.com (10.70.37.83) 56(84) bytes of data. 64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=1 ttl=64 time=0.041 ms 64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=2 ttl=64 time=0.030 ms --- rhhihost1.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.030/0.035/0.041/0.008 ms [root@rhhihost1 ~]# ping -c2 rhhihost2.lab.eng.blr.redhat.com PING rhhihost2.lab.eng.blr.redhat.com (10.70.37.218) 56(84) bytes of data. 64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=1 ttl=64 time=0.357 ms 64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=2 ttl=64 time=0.372 ms --- rhhihost2.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.357/0.364/0.372/0.020 ms [root@rhhihost1 ~]# ping -c2 rhhihost3.lab.eng.blr.redhat.com PING rhhihost3.lab.eng.blr.redhat.com (10.70.37.217) 56(84) bytes of data. 64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=1 ttl=64 time=1.09 ms 64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=2 ttl=64 time=0.309 ms --- rhhihost3.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.309/0.699/1.090/0.391 ms [root@rhhihost1 ~]# dig command usage on these hostnames -------------------------------------- [root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short [root@rhhihost1 ~]# echo $? 0 Error message on the console ----------------------------- <snip> TASK [gluster.features/roles/gluster_hci : Check if valid FQDN is provided] **** failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost3.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost3.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.048777", "end": "2019-03-26 13:01:53.083083", "failed_when_result": true, "item": "rhhihost3.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.034306", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost1.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost1.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.045785", "end": "2019-03-26 13:01:53.388902", "failed_when_result": true, "item": "rhhihost1.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.343117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost2.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost2.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.046347", "end": "2019-03-26 13:01:53.690238", "failed_when_result": true, "item": "rhhihost2.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.643891", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} </snip>
I feel 'getent' could be used instead of 'dig' to validate the resolvable FQDNs. I have seen that 'getent' was used by ovirt for FQDN checks. [root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short [root@rhhihost1 ~]# echo $? 0 [root@rhhihost1 ~]# getent ahosts rhhihost1.lab.eng.blr.redhat.com 10.70.37.83 STREAM rhhihost1.lab.eng.blr.redhat.com 10.70.37.83 DGRAM 10.70.37.83 RAW
We cannot use getent to check if a given host is FQDN or not. For example, if you have some random name in your /etc/hosts for example: cat /etc/hosts: 10.70.42.133 foonode1 getent ahosts foonode1 10.70.42.133 STREAM foonode1 10.70.42.133 DGRAM 10.70.42.133 RAW Even though foonode1 is not FQDN, I see similar results. And there is no way I could make out if it is a FQDN.
(In reply to Sachidananda Urs from comment #4) > We cannot use getent to check if a given host is FQDN or not. For example, > if you have some random name in your /etc/hosts for example: > > cat /etc/hosts: > > 10.70.42.133 foonode1 > > getent ahosts foonode1 > 10.70.42.133 STREAM foonode1 > 10.70.42.133 DGRAM > 10.70.42.133 RAW > > Even though foonode1 is not FQDN, I see similar results. And there is no way > I could make out if it is a FQDN. This is just the thought I had about the usage of 'getent' which I could observed from ovirt way of validating the hostnames. But the crux of the problem is that RHHI-V deployment, couldn't proceed with static hostnames/FQDNs , with no DNS entries.
> This is just the thought I had about the usage of 'getent' which I could > observed from > ovirt way of validating the hostnames. But the crux of the problem is that > RHHI-V deployment, > couldn't proceed with static hostnames/FQDNs , with no DNS entries. If we have to use static hostnames, we can disable the FQDN check in gluster-ansible. Since using getent does not validate if the given hostname is FQDN. It just gives the ip address of the given hostname.
PR: https://github.com/gluster/gluster-ansible-features/pull/24
Tested with RHVH 4.3.5 + RHEL 7.7 + RHGS 3.4.4 ( interim build - glusterfs-6.0-6 ) with ansible 2.8.1-1 with: gluster-ansible-features-1.0.5-2.el7rhgs.noarch gluster-ansible-roles-1.0.5-2.el7rhgs.noarch gluster-ansible-infra-1.0.4-3.el7rhgs.noarch static hostnames in the /etc/hosts file are now valid hostnames and deployment proceeds with this hostname in place
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2557