Bug 1405447 - [RFE] Provide the configuration check for Grafton deployment
Summary: [RFE] Provide the configuration check for Grafton deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gdeploy
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: RHGS 3.2.0
Assignee: Sachidananda Urs
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: Gluster-HC-2 1351503
TreeView+ depends on / blocked
 
Reported: 2016-12-16 14:37 UTC by SATHEESARAN
Modified: 2017-03-23 05:09 UTC (History)
10 users (show)

Fixed In Version: gdeploy-2.0.1-11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 05:09:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1386443 0 unspecified CLOSED Installation of gluster gives Failed to execute stage 'Misc configuration': Cannot acquire host id: ... SanlockException... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1425767 0 high CLOSED grafton-sanity-check doesn't fail when a given disk doesn't exist or disk is not blank 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2017:0482 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.2.0 gdeploy bug fix and enhancement update 2017-03-23 09:06:28 UTC

Internal Links: 1386443 1425767

Description SATHEESARAN 2016-12-16 14:37:29 UTC
Description of problem:
-----------------------
Provide the pre-flight check for the Grafton deployments

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
gdeploy-2.0.1-5

How reproducible:
-----------------
NA

Steps to Reproduce:
--------------------
NA

Actual results:
---------------
Currently there is no configuration check available

Expected results:
-----------------
Configuration check helps the customer to understand that something is incorrect and serves as a warning before installation of Grafton

Comment 1 SATHEESARAN 2016-12-16 14:49:43 UTC
The following checks are required:

1. Disks ( RAID volumes ) should be blank and should not contain any partition table. If there are MBR/GPT partition style information available, gdeploy should throw proper warning and stop proceeding

2. Network configuration.
The hostname of the server should resolve to the same IP address on all the servers.

In the particular customer case, we have found that the hostname resolved to a different IP on one particular host. This should be caught well in advance before installation and gdeploy should stop proceeding when found a problem.

Hint: 'nslookup' should help in getting the hostnames & IPs of that host

Comment 3 Sahina Bose 2016-12-21 07:08:14 UTC
We should be able to invoke a script before the deployment, and gdeploy should fail if script returns a non-zero return value.

Can this script be included in extras in gdeploy rpm for 3.2?

Comment 4 Sachidananda Urs 2016-12-21 07:15:25 UTC
Sahina I think we can't put this in 3.2.0 release, however we can make another release before Grafton GA.

Comment 8 Sahina Bose 2016-12-22 08:54:13 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1386443 - is there a way to provide check to avoid this too?

Comment 9 Sachidananda Urs 2016-12-22 10:02:39 UTC
This is during hosted engine setup, and there is no way to know if it fails unless we run the `setup' or the setup process provides --dry-run or something like that to detect any errors.

Comment 11 Sachidananda Urs 2017-01-05 10:10:41 UTC
(In reply to Sahina Bose from comment #3)
> We should be able to invoke a script before the deployment, and gdeploy
> should fail if script returns a non-zero return value.

We can now include ignore_script_errors=no and in the script we can return
non-zero return status and gdeploy will exit.

Comment 12 RamaKasturi 2017-01-10 14:50:58 UTC
proposing this as blocker because this script is very much required for grafton deployments to avoid installation/setup failures.

Comment 13 Sahina Bose 2017-01-13 09:21:33 UTC
Requirements are as per Comment 1.

For the network validation, if a host has multiple IP addresses, ensure
1. each IP address resolves to a separate fqdn
2. Any host FQDN resolves consistently to the same IP address from all hosts. i.e if x.domain.com resolves to ip1 from host1, it should resolve to ip1 even from host2.

Comment 15 Sachidananda Urs 2017-01-23 07:07:37 UTC
Commits:

https://github.com/gluster/gdeploy/commit/0433977fe
https://github.com/gluster/gdeploy/commit/63703e7bd

Should resolve the issues.

Comment 17 SATHEESARAN 2017-02-06 15:39:32 UTC
When executed there was an error.

PLAY [gluster_servers] *********************************************************

TASK [Run a shell script] ******************************************************
fatal: [dhcp37-196.lab.eng.blr.redhat.com]: FAILED! => {"failed": true, "msg": "The conditional check 'result.rc != 0' failed. The error was: error while evaluating conditional (result.rc != 0): 'dict object' has no attribute 'rc'"}
fatal: [dhcp37-188.lab.eng.blr.redhat.com]: FAILED! => {"failed": true, "msg": "The conditional check 'result.rc != 0' failed. The error was: error while evaluating conditional (result.rc != 0): 'dict object' has no attribute 'rc'"}
fatal: [dhcp37-169.lab.eng.blr.redhat.com]: FAILED! => {"failed": true, "msg": "The conditional check 'result.rc != 0' failed. The error was: error while evaluating conditional (result.rc != 0): 'dict object' has no attribute 'rc'"}
	to retry, use: --limit @/tmp/tmpYjqYOD/run-script.retry

PLAY RECAP *********************************************************************
dhcp37-169.lab.eng.blr.redhat.com : ok=0    changed=0    unreachable=0    failed=1   
dhcp37-188.lab.eng.blr.redhat.com : ok=0    changed=0    unreachable=0    failed=1   
dhcp37-196.lab.eng.blr.redhat.com : ok=0    changed=0    unreachable=0    failed=1  

Also the script - /usr/share/ansible/gdeploy/scripts/grafton-sanity-check.sh - uses 'ifconfig' command which was obseleted in favor of 'ip' command. 'ifconfig' command is not available in RHEL 7.3

So replace 'ifconfig' command with 'ip addr show'

Based on the above observations this bug failed verification.

Comment 18 Sachidananda Urs 2017-02-11 09:54:27 UTC
https://github.com/gluster/gdeploy/commit/a2227 should fix the issue

Comment 19 Sachidananda Urs 2017-02-11 09:55:10 UTC
The error:

TASK [Run a shell script] ******************************************************
fatal: [dhcp37-196.lab.eng.blr.redhat.com]: FAILED! => {"failed": true, "msg": "The conditional check 'result.rc != 0' failed. The error was: error while evaluating 

Was due to the missing `ifconfig' command.

Comment 20 SATHEESARAN 2017-02-11 17:04:01 UTC
(In reply to Sachidananda Urs from comment #19)
> The error:
> 
> TASK [Run a shell script]
> ******************************************************
> fatal: [dhcp37-196.lab.eng.blr.redhat.com]: FAILED! => {"failed": true,
> "msg": "The conditional check 'result.rc != 0' failed. The error was: error
> while evaluating 
> 
> Was due to the missing `ifconfig' command.

True. 'ifconfig' command in RHEL-7 is deprecated in favor of 'ip' command as mentioned in comment17. Please make the script to use 'ip addr show' command

Comment 21 SATHEESARAN 2017-02-17 13:42:08 UTC
Tested with gdeploy-2.0.1-11.el7rhgs

1. When the disks are partitioned with GPT, gdeploy check validates the same and with the presence of ignore_script_errors, gdeploy stops and doesn't proceed further.

2. When one of the host is not reachable from the master node ( node from which the gdeploy is run ), gdeploy check validates the same and with the presence of ignore_script_errors, gdeploy stops and doesn't proceed further.

3. Created a situation where from host2, host1's hostname resolves to some other IP which is not reachable ( by having entries in /etc/hosts ), then too gdeploy check validates the same and with the presence of ignore_script_errors, gdeploy stops and doesn't proceed further.

Based on this observation, moving this bug to verified state

Comment 23 errata-xmlrpc 2017-03-23 05:09:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0482.html


Note You need to log in before you can comment on or make changes to this bug.