Bug 1518525

Summary: Tendrl-ansible setup script fails if the server has 2 IP addresses
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: web-admin-tendrl-ansibleAssignee: Timothy Asir <tjeyasin>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: asriram, dahorak, mbukatov, nthomas, rghatvis, rhinduja, rhs-bugs, sanandpa, srmukher, tjeyasin
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-ansible-1.6.1-2.el7rhgs.noarch.rpm, tendrl-api-1.6.1-1.el7rhgs.noarch.rpm, tendrl-commons-1.6.1-1.el7rhgs.noarch.rpm, tendrl-monitoring-integration-1.6.1-1.el7rhgs.noarch.rpm, tendrl-node-agent-1.6.1-1.el7, tendrl-ui-1.6.1-1.el7rhgs.noarch.rpm, Doc Type: Known Issue
Doc Text:
Previously, during Red Hat Gluster Storage Web Administration installation, if the server had multiple active IP addresses, tendrl-ansible failed to automatically choose the correct one, causing installation failure. In this version, the user has to set all the required variables for tendrl-ansible as per the installation instructions.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:59:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503134    
Attachments:
Description Flags
console logs none

Description Sweta Anandpara 2017-11-29 06:57:06 UTC
Created attachment 1360185 [details]
console logs

Description of problem:
=======================
My setup comprised of beaker (perf) machines, for storage as well as tendrl server. 'ip a' on the tendrl server showed the below output. Even after specifying hostname in site.yml file, 'etcd_ip_address' was getting udpated to a different IP -- and that resulted in failure while creating tendrl-admin-user. Pasted below is the error output seen on console. I suppose we need better resolution of IP address in the line: etcd_ip_address: "{{ lookup('dig', ansible_fqdn) }}"

TASK [tendrl-ansible.tendrl-server : Create Tendrl admin user] ********************
^[[0;31mfatal: [gqas012.sbu.lab.eng.bos.redhat.com]: FAILED! => {"changed": true, "cmd": "RACK_ENV=production rake etcd:load_admin", "delta": "0:00:00.606007", "end": "2017-11-28 20:49:48.948206", "failed": true, "failed_when_result": true, "rc": 1, "start": "2017-11-28 20:49:48.342199", "stderr": "rake aborted!\nConnection refused - connect(2)\n\nTasks: TOP => etcd:load_admin\n(See full trace by running task with --trace)", "stderr_lines": ["rake aborted!", "Connection refused - connect(2)", "", "Tasks: TOP => etcd:load_admin", "(See full trace by running task with --trace)"], "stdout": "\"Generating default Tendrl admin\"", "stdout_lines": ["\"Generating default Tendrl admin\""]}^[[0m



1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 78:2b:cb:74:46:ac brd ff:ff:ff:ff:ff:ff
    inet 10.16.156.33/21 brd 10.16.159.255 scope global dynamic em1
       valid_lft 57514sec preferred_lft 57514sec
    inet6 fe80::7a2b:cbff:fe74:46ac/64 scope link 
       valid_lft forever preferred_lft forever
3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 78:2b:cb:74:46:ad brd ff:ff:ff:ff:ff:ff
4: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:07:a2:84 brd ff:ff:ff:ff:ff:ff
    inet 192.168.96.143/24 brd 192.168.96.255 scope global p2p1
       valid_lft forever preferred_lft forever
    inet6 fe80::92e2:baff:fe07:a284/64 scope link 
       valid_lft forever preferred_lft forever
5: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 90:e2:ba:07:a2:85 brd ff:ff:ff:ff:ff:ff




Version-Release number of selected component (if applicable):
============================================================
tendrl-ansible-1.5.4-2.el7rhgs.noarch

How reproducible:
=================
1:1


Steps to Reproduce:
===================
1. Have storage nodes and a rhel7 machine for web-admin server, which has 2 IP addresses.
2. 'Run ansible-playbook -i hosts site.yml' from the tendrl-server

Actual results:
===============
Step 2 fails in the middle


Expected results:
================
tendrl-ansible should succeed in resolving and picking the right IP address.


Additional info:
================
PFA the console logs.

Comment 2 Martin Bukatovic 2017-11-30 07:12:13 UTC
I need to review this. Reply is expected no sooner than on Monday.

Comment 4 Nishanth Thomas 2017-12-04 16:56:53 UTC
Since workaround available, moving this out of this release.

Comment 6 Martin Bukatovic 2017-12-11 17:12:43 UTC
Sorry, I didn't have the time to try to replicate this.

That said, is this a issue caused by (previously) unclear downstream
documentation or actual issue which one would hit even when follows
upstream installation docs and README file from rpm package?

Context: The docs states one needs to update variables in example playbook
file when needed. The defaults there are not expected to just work in all
circumstances.

Comment 7 Martin Bukatovic 2017-12-11 17:13:35 UTC
Fixing typo:

(In reply to Martin Bukatovic from comment #6)
> upstream installation docs and README file from rpm package?

s/upstream/downstream/

Comment 10 Nishanth Thomas 2017-12-15 02:55:00 UTC
Remove 'and not in the tendrl-ansible configuration file'

Comment 13 Nishanth Thomas 2017-12-15 12:18:05 UTC
Updated,please check now

Comment 15 Timothy Asir 2018-01-29 04:50:18 UTC
IP address of a machine can change any time unless we have a static ip.
If there is a change in ip, we may need to update wherever the system has configured to use the ip. So its better to use static-ip or host-fqdn which can be configured to resolve the ip at top level.

@Sweta Anandpara
Could you please post the results of the following output:

$ip a
$nslookup <hostname>

Comment 17 Timothy Asir 2018-01-29 08:17:39 UTC
Ok, That means even ping <hostname/fqdn> might return incorrect ip. I hope this may not be an ansible or tendrl-ansible issue. It could be a dns issue and for some reason (dnscache) the host could unable to provide the correct ip address of the given fqdn. I will try to reproduce and update here and come up with a possible solution.

@Sweta, did you installed etcd in a different server or in a same server where you have installed tendrl-api and tendrl-ui?

Comment 20 Daniel Horák 2018-07-25 13:56:28 UTC
I'm verifying this Bug based on following points:

* All mandatory variables have to be configured manually by user, no
  automated guess is performed, so it is up to the user, to properly
  configure the variables.

* If some mandatory variable is not configured, execution of tendrl-ansible
  site.yml playbook is failed with proper error message:

    You need to define all mandatory ansible variables to run this
    playbook, see README file for guidance.

* The required mandatory variables are described in the README and will be
  described in documentation (Bug 1584737).

Also this bug doesn't cover case, when Gluster Cluster is using different
network than the monitoring (similar scenario is covered by Bug 1573075).

Tested with:
  tendrl-ansible-1.6.3-5.el7rhgs.noarch

>> VERIFIED

Comment 25 errata-xmlrpc 2018-09-04 06:59:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Comment 26 Red Hat Bugzilla 2023-09-14 04:12:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days