Description of problem: ----------------------- There are requirements, when the user environment may have static IP configuration, and hostnames are just added to /etc/hosts file locally on each hosts. In this case, its expected that the RHHI-V deployment should work good, But with the deployment checks introduced with gluster-ansible, now its using 'dig' command to validate the FQDNs and if the hostname is available locally in /etc/hosts and that's not accepted Version-Release number of selected component (if applicable): ------------------------------------------------------------- gluster-ansible-repositories-1.0-1.el7rhgs.noarch gluster-ansible-maintenance-1.0.1-1.el7rhgs.noarch gluster-ansible-features-1.0.4-5.el7rhgs.noarch gluster-ansible-cluster-1.0-1.el7rhgs.noarch gluster-ansible-roles-1.0.4-4.el7rhgs.noarch gluster-ansible-infra-1.0.3-3.el7rhgs.noarch How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Use the static hostnames in /etc/hosts 2. Use these hostnames for RHHI-V gluster deployment Actual results: --------------- Gluster deployment fails Expected results: ---------------- Gluster deployment should succeed as the hostnames are available in /etc/hosts
Content of /etc/hosts file --------------------------- [root@rhhihost1 ~]# cat /etc/hosts 10.70.37.83 rhhihost1.lab.eng.blr.redhat.com 10.70.37.218 rhhihost2.lab.eng.blr.redhat.com 10.70.37.217 rhhihost3.lab.eng.blr.redhat.com 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Check whether all hosts are reachable with this hostname --------------------------------------------------------- [root@rhhihost1 ~]# ping -c2 rhhihost1.lab.eng.blr.redhat.com PING rhhihost1.lab.eng.blr.redhat.com (10.70.37.83) 56(84) bytes of data. 64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=1 ttl=64 time=0.041 ms 64 bytes from rhhihost1.lab.eng.blr.redhat.com (10.70.37.83): icmp_seq=2 ttl=64 time=0.030 ms --- rhhihost1.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.030/0.035/0.041/0.008 ms [root@rhhihost1 ~]# ping -c2 rhhihost2.lab.eng.blr.redhat.com PING rhhihost2.lab.eng.blr.redhat.com (10.70.37.218) 56(84) bytes of data. 64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=1 ttl=64 time=0.357 ms 64 bytes from rhhihost2.lab.eng.blr.redhat.com (10.70.37.218): icmp_seq=2 ttl=64 time=0.372 ms --- rhhihost2.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.357/0.364/0.372/0.020 ms [root@rhhihost1 ~]# ping -c2 rhhihost3.lab.eng.blr.redhat.com PING rhhihost3.lab.eng.blr.redhat.com (10.70.37.217) 56(84) bytes of data. 64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=1 ttl=64 time=1.09 ms 64 bytes from rhhihost3.lab.eng.blr.redhat.com (10.70.37.217): icmp_seq=2 ttl=64 time=0.309 ms --- rhhihost3.lab.eng.blr.redhat.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.309/0.699/1.090/0.391 ms [root@rhhihost1 ~]# dig command usage on these hostnames -------------------------------------- [root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short [root@rhhihost1 ~]# echo $? 0 Error message on the console ----------------------------- <snip> TASK [gluster.features/roles/gluster_hci : Check if valid FQDN is provided] **** failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost3.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost3.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.048777", "end": "2019-03-26 13:01:53.083083", "failed_when_result": true, "item": "rhhihost3.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.034306", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost1.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost1.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.045785", "end": "2019-03-26 13:01:53.388902", "failed_when_result": true, "item": "rhhihost1.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.343117", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} failed: [rhhihost3.lab.eng.blr.redhat.com -> localhost] (item=rhhihost2.lab.eng.blr.redhat.com) => {"changed": true, "cmd": ["dig", "rhhihost2.lab.eng.blr.redhat.com", "+short"], "delta": "0:00:00.046347", "end": "2019-03-26 13:01:53.690238", "failed_when_result": true, "item": "rhhihost2.lab.eng.blr.redhat.com", "rc": 0, "start": "2019-03-26 13:01:53.643891", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} </snip> I feel 'getent' could be used instead of 'dig' to validate the resolvable FQDNs. I have seen that 'getent' was used by ovirt for FQDN checks. [root@rhhihost1 ~]# dig rhhihost1.lab.eng.blr.redhat.com +short [root@rhhihost1 ~]# echo $? 0 [root@rhhihost1 ~]# getent ahosts rhhihost1.lab.eng.blr.redhat.com 10.70.37.83 STREAM rhhihost1.lab.eng.blr.redhat.com 10.70.37.83 DGRAM 10.70.37.83 RAW
Marking this bug for RHHI-V 1.6 known_issue. If the user had to use hostnames from /etc/hosts, then he need to add the param: 'gluster_features_fqdn_check: false' under each hosts in the generated ansible vars file. Example: <snip> hosts: rhhihost1.example.com: gluster_features_fqdn_check: false <---------------- gluster_infra_volume_groups: - vgname: gluster_vg_sdb pvname: /dev/sdb - vgname: gluster_vg_sdc pvname: /dev/mapper/vdo_sdc </snip>
LGTM
Tested with RHVH 4.3.5 + RHEL 7.7 + RHGS 3.4.4 ( interim build - glusterfs-6.0-6 ) with ansible 2.8.1-1 with: gluster-ansible-features-1.0.5-2.el7rhgs.noarch gluster-ansible-roles-1.0.5-2.el7rhgs.noarch gluster-ansible-infra-1.0.4-3.el7rhgs.noarch static hostnames in the /etc/hosts file are now valid hostnames and deployment proceeds with this hostname in place
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2963