Bug 1840003

Summary:	Replace host fails with gluster-maintenance ansible role
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	SATHEESARAN <sasundar>
Component:	gluster-ansible	Assignee:	Prajith <pkesavap>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.5	CC:	godas, pkesavap, pprakash, puebele, rhs-bugs, sabose, sasundar
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.5.z Batch Update 2
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	gluster-ansible-maintenance-1.0.1-4.el8rhgs	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:	1839998	Environment:
Last Closed:	2020-06-16 05:57:32 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1641431, 1839998

Description SATHEESARAN 2020-05-26 08:31:02 UTC

Description of problem:
------------------------
Replace host with the same host, fails with the volume reset task. As I understand this should be possibly the way the brick path names are extracted using grep from the volume info output.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
gluster-ansible-maintenance-1.0.1-2.el8rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
---------------------
1. Execute the playbook to replace the host with itself

Actual results:
----------------
Failure to do volume reset

Expected results:
------------------
volume reset command invocation should be successful

Comment 1 SATHEESARAN 2020-05-26 08:31:25 UTC

The fix should do 2 things:

1. Fix the existing logic that extracts the brick_path from the 'gluster volume info <vol>' output.

In this case, there could chance of asymmetric brick.

<existing_logic>
---
# Set up the volume management
- name: Fetch the directory and volume details
  block:
    - name: Get the list of volumes on the machine
      shell: ls "{{ glusterd_libdir }}/vols"
      register: dir_list

    - set_fact:
        volumes: "{{ dir_list.stdout.split() }}"

    # Find the list of bricks on the machine
    - name: Get the list of bricks corresponding to volume
      shell: >
        gluster vol info {{ item }} | grep "Brick.*{{ item }}:" |              <--------not greps for the hostname
        awk -F: '{ print $3 }'
      with_items: "{{ volumes }}"
      register: brick_list
</existing_logic>

So the new logic should do :

 -----> gluster volume info vmstore | grep {{gluster_maintenance_cluster_node}}  | awk -F: '{ print $3 }'

2. The above logic should also consider the possibility of nx3 replicate volume, where there could be more than
one entry per volume to be 'volume reset'

Comment 2 Gobinda Das 2020-05-26 11:29:52 UTC

PR: https://github.com/gluster/gluster-ansible/pull/108

Comment 4 SATHEESARAN 2020-05-27 05:49:23 UTC

(In reply to Gobinda Das from comment #2)
> PR: https://github.com/gluster/gluster-ansible/pull/108

Gobinda,

Lets track the replace-host playbook with this bug - https://bugzilla.redhat.com/show_bug.cgi?id=1641431

This bug is particularly to fix gluster-ansible-maintenance task that does volume restoration for
replace-host workflow

Comment 5 SATHEESARAN 2020-05-27 08:11:39 UTC

*** Bug 1840540 has been marked as a duplicate of this bug. ***

Comment 6 Gobinda Das 2020-05-28 08:25:27 UTC

PR: https://github.com/gluster/gluster-ansible-maintenance/pull/9

Comment 8 SATHEESARAN 2020-06-08 10:44:03 UTC

Tested with RHV 4.4.1 and gluster-ansible-maintenance-1.0.3.el8rhgs

There are 2 failures

1. Syntax error related to unremoved "-block" in the role
2. Semantic error, where the 'gluster_ansible_cluster_node' is used to replace host workflow

Comment 9 Prajith 2020-06-08 11:04:50 UTC

PR:- https://github.com/gluster/gluster-ansible-maintenance/pull/14

Comment 10 SATHEESARAN 2020-06-11 18:42:45 UTC

Verified with gluster-ansible-maintenance-1.0.1-4.el8rhgs

1. Once the host is replaced by the reinstalled version of host with same FQDN,
replace-brick worked good and that replace-brick command was successful

2. Healing got triggered post this operation

Comment 12 errata-xmlrpc 2020-06-16 05:57:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2575