Bug 1053330

Summary: [RHEVM-RHS] RHSS Node doesn't come up after reinstalling it using RHEVM UI,
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: vdsmAssignee: Timothy Asir <tjeyasin>
Status: CLOSED WONTFIX QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: acathrow, ecohen, gklein, grajaiya, iheim, nlevinki, Rhev-m-bugs, sabose, tjeyasin, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: gluster
Fixed In Version: 4.13.0-24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-16 10:46:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovirt host deploy log from RHEVM
none
RHEVM Screenshot showing "re-install" option none

Description SATHEESARAN 2014-01-15 03:27:05 UTC
Description of problem:
-----------------------
Added RHSS Node to the gluster enabled glusterd and it came online in RHEVM UI.
When moving it to MAINTENANCE state and then again re-installing using RHEVM UI, doesn't bring it online though


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHSS  - glusterfs-3.4.0.57rhs-1.el6rhs
RHEVM - IS32 (3.3.0-0.45.el6ev)

How reproducible:
----------------
Happened twice out of 2 attempts


Steps to Reproduce:
------------------
1. Add the RHSS Node to gluster enabled gluster
2. Once RHSS Node comes up in RHEVM UI, put it to Maintenanace state
3. use re-install option in RHEVM UI


Actual results:
--------------
RHSS Node never comes up in RHEVM UI. It gives error message as, "Host 10.70.37.10 installation failed. Command returned failure code 1 during SSH session 'root.37.10'"


Expected results:
-----------------
RHSS Node should come online in RHEVM UI


Additional info:
----------------
1. I tried this case with following use case :
a. Importing existing gluster configuration doesn't set iptables on RHSS Nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1051019
b. So brought one of the RHSS Node to MAINTENANCE state using RHEVM UI
c. When selecting that RHSS Node, select "General" tab, and there is an option "Host is in maintenance mode, you can Activate it by pressing the Activate button. If you wish to upgrade or reinstall it click
here."
d. Pressed on link, "here" in previous step and re-installation begins

But finally not being up and being in DOWN state.

Comment 1 SATHEESARAN 2014-01-15 03:28:26 UTC
VDSM host deploy logs shows no error and I see iptables rules are up, though the state of RHSS Node was shown as down.

It means bootstrapping has again tookplace adding those rules

Comment 2 SATHEESARAN 2014-01-15 03:34:45 UTC
Created attachment 850329 [details]
ovirt host deploy log from RHEVM

Comment 3 SATHEESARAN 2014-01-15 03:54:19 UTC
Created attachment 850331 [details]
RHEVM Screenshot showing "re-install" option

Comment 4 SATHEESARAN 2014-01-15 04:59:49 UTC
This bug is the manifestation of this bug, https://bugzilla.redhat.com/show_bug.cgi?id=1038038

Since gateway is not configured correctly, DNS resolution for bricks were not happening successfully when glusterd was restarted

This is evident from glusterd logs,

<snip>
[2014-01-15 09:52:43.444950] I [glusterd.c:140:glusterd_uuid_init] 0-management: retrieved UUID: 1650fc10-5365-40e3-8fea-1e87908a9f55
[2014-01-15 09:52:43.445290] E [glusterd-store.c:2600:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-01-15 09:52:43.445318] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-01-15 09:52:43.445335] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-01-15 09:52:43.445345] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2014-01-15 09:52:43.445768] W [glusterfsd.c:1099:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x6b1) [0x4069c1] (-->/usr/sbin/glusterd(glusterfs_vo
lumes_init+0xb7) [0x405177] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x106) [0x405086]))) 0-: received signum (0), shutting down
[2014-01-15 10:12:06.572837] I [glusterfsd.c:2026:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.0.57rhs (/usr/sbin/glust
erd --pid-file=/var/run/glusterd.pid)

</snip>

I have changed DEFROUTE to YES, in '/etc/sysconfig/network-scripts/ifcfg-rhevm'
and restarted network and that solved the problem.

In this case, re-installation of RHSS Node using RHEVM UI comes back online

Comment 6 Timothy Asir 2014-01-15 11:21:24 UTC
Patch sent to downstream: https://code.engineering.redhat.com/gerrit/#/c/18372/

Comment 7 Gowrishankar Rajaiyan 2014-01-15 12:54:25 UTC
Is this not a dupe of bug 1038038 ? I mean I do not see a fix particularly for this bug or did I miss anything obvious?

Comment 8 Sahina Bose 2014-01-16 06:09:12 UTC
Yes, the fix for bug 1038038, fixes this well.
Though the fix is the same, the test scenarios of both bugs are different.

Comment 9 Gowrishankar Rajaiyan 2014-01-16 10:28:50 UTC
Thanks for confirming Sahina. This test scenario is covered at https://tcms.engineering.redhat.com/run/107332/#caserun_4176587 which will be executed as part of regression cycle.

Giving qa_ack- since there is no separate fix for this case.

Comment 10 RHEL Program Management 2014-01-16 10:46:43 UTC
Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.