Bug 1053330

Summary:

[RHEVM-RHS] RHSS Node doesn't come up after reinstalling it using RHEVM UI,

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

vdsm

Assignee:

Timothy Asir <tjeyasin>

Status:

CLOSED WONTFIX

QA Contact:

Sudhir D <sdharane>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

2.1

CC:

acathrow, ecohen, gklein, grajaiya, iheim, nlevinki, Rhev-m-bugs, sabose, tjeyasin, yeylon

Target Milestone:

---

Keywords:

ZStream

Target Release:

RHGS 2.1.2

Hardware:

x86_64

OS:

Linux

Whiteboard:

gluster

Fixed In Version:

4.13.0-24

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-01-16 10:46:43 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ovirt host deploy log from RHEVM	none
RHEVM Screenshot showing "re-install" option	none

Description SATHEESARAN 2014-01-15 03:27:05 UTC

Description of problem:
-----------------------
Added RHSS Node to the gluster enabled glusterd and it came online in RHEVM UI.
When moving it to MAINTENANCE state and then again re-installing using RHEVM UI, doesn't bring it online though

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHSS - glusterfs-3.4.0.57rhs-1.el6rhs
RHEVM - IS32 (3.3.0-0.45.el6ev)

How reproducible:
----------------
Happened twice out of 2 attempts

Steps to Reproduce:
------------------
1. Add the RHSS Node to gluster enabled gluster
2. Once RHSS Node comes up in RHEVM UI, put it to Maintenanace state
3. use re-install option in RHEVM UI

Actual results:
--------------
RHSS Node never comes up in RHEVM UI. It gives error message as, "Host 10.70.37.10 installation failed. Command returned failure code 1 during SSH session 'root.37.10'"

Expected results:
-----------------
RHSS Node should come online in RHEVM UI

Additional info:
----------------
1. I tried this case with following use case :
a. Importing existing gluster configuration doesn't set iptables on RHSS Nodes
https://bugzilla.redhat.com/show_bug.cgi?id=1051019
b. So brought one of the RHSS Node to MAINTENANCE state using RHEVM UI
c. When selecting that RHSS Node, select "General" tab, and there is an option "Host is in maintenance mode, you can Activate it by pressing the Activate button. If you wish to upgrade or reinstall it click
here."
d. Pressed on link, "here" in previous step and re-installation begins

But finally not being up and being in DOWN state.

Comment 1 SATHEESARAN 2014-01-15 03:28:26 UTC

VDSM host deploy logs shows no error and I see iptables rules are up, though the state of RHSS Node was shown as down.

It means bootstrapping has again tookplace adding those rules

Comment 2 SATHEESARAN 2014-01-15 03:34:45 UTC

Created attachment 850329 [details]
ovirt host deploy log from RHEVM

Comment 3 SATHEESARAN 2014-01-15 03:54:19 UTC

Created attachment 850331 [details]
RHEVM Screenshot showing "re-install" option

Comment 4 SATHEESARAN 2014-01-15 04:59:49 UTC

This bug is the manifestation of this bug, https://bugzilla.redhat.com/show_bug.cgi?id=1038038

Since gateway is not configured correctly, DNS resolution for bricks were not happening successfully when glusterd was restarted

This is evident from glusterd logs,

<snip>
[2014-01-15 09:52:43.444950] I [glusterd.c:140:glusterd_uuid_init] 0-management: retrieved UUID: 1650fc10-5365-40e3-8fea-1e87908a9f55
[2014-01-15 09:52:43.445290] E [glusterd-store.c:2600:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-01-15 09:52:43.445318] E [xlator.c:423:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-01-15 09:52:43.445335] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-01-15 09:52:43.445345] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2014-01-15 09:52:43.445768] W [glusterfsd.c:1099:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x6b1) [0x4069c1] (-->/usr/sbin/glusterd(glusterfs_vo
lumes_init+0xb7) [0x405177] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x106) [0x405086]))) 0-: received signum (0), shutting down
[2014-01-15 10:12:06.572837] I [glusterfsd.c:2026:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.0.57rhs (/usr/sbin/glust
erd --pid-file=/var/run/glusterd.pid)

</snip>

I have changed DEFROUTE to YES, in '/etc/sysconfig/network-scripts/ifcfg-rhevm'
and restarted network and that solved the problem.

In this case, re-installation of RHSS Node using RHEVM UI comes back online

Comment 6 Timothy Asir 2014-01-15 11:21:24 UTC

Patch sent to downstream: https://code.engineering.redhat.com/gerrit/#/c/18372/

Comment 7 Gowrishankar Rajaiyan 2014-01-15 12:54:25 UTC

Is this not a dupe of bug 1038038 ? I mean I do not see a fix particularly for this bug or did I miss anything obvious?

Comment 8 Sahina Bose 2014-01-16 06:09:12 UTC

Yes, the fix for bug 1038038, fixes this well.
Though the fix is the same, the test scenarios of both bugs are different.

Comment 9 Gowrishankar Rajaiyan 2014-01-16 10:28:50 UTC

Thanks for confirming Sahina. This test scenario is covered at https://tcms.engineering.redhat.com/run/107332/#caserun_4176587 which will be executed as part of regression cycle.

Giving qa_ack- since there is no separate fix for this case.

Comment 10 RHEL Program Management 2014-01-16 10:46:43 UTC

Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.