1840003 – Replace host fails with gluster-maintenance ansible role

Bug 1840003 - Replace host fails with gluster-maintenance ansible role

Summary: Replace host fails with gluster-maintenance ansible role

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-ansible
Sub Component:
Version:	rhgs-3.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 2
Assignee:	Prajith
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1840540 (view as bug list)
Depends On:
Blocks:	1641431 1839998
TreeView+	depends on / blocked

Reported:	2020-05-26 08:31 UTC by SATHEESARAN
Modified:	2020-06-16 05:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:	gluster-ansible-maintenance-1.0.1-4.el8rhgs
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1839998
Environment:
Last Closed:	2020-06-16 05:57:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:2575	0	None	None	None	2020-06-16 05:57:50 UTC

Description SATHEESARAN 2020-05-26 08:31:02 UTC

Description of problem:
------------------------
Replace host with the same host, fails with the volume reset task. As I understand this should be possibly the way the brick path names are extracted using grep from the volume info output.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
gluster-ansible-maintenance-1.0.1-2.el8rhgs

How reproducible:
-----------------
Always

Steps to Reproduce:
---------------------
1. Execute the playbook to replace the host with itself

Actual results:
----------------
Failure to do volume reset

Expected results:
------------------
volume reset command invocation should be successful

Comment 1 SATHEESARAN 2020-05-26 08:31:25 UTC

The fix should do 2 things:

1. Fix the existing logic that extracts the brick_path from the 'gluster volume info <vol>' output.

In this case, there could chance of asymmetric brick.

<existing_logic>
---
# Set up the volume management
- name: Fetch the directory and volume details
  block:
    - name: Get the list of volumes on the machine
      shell: ls "{{ glusterd_libdir }}/vols"
      register: dir_list

    - set_fact:
        volumes: "{{ dir_list.stdout.split() }}"

    # Find the list of bricks on the machine
    - name: Get the list of bricks corresponding to volume
      shell: >
        gluster vol info {{ item }} | grep "Brick.*{{ item }}:" |              <--------not greps for the hostname
        awk -F: '{ print $3 }'
      with_items: "{{ volumes }}"
      register: brick_list
</existing_logic>

So the new logic should do :

 -----> gluster volume info vmstore | grep {{gluster_maintenance_cluster_node}}  | awk -F: '{ print $3 }'

2. The above logic should also consider the possibility of nx3 replicate volume, where there could be more than
one entry per volume to be 'volume reset'

Comment 2 Gobinda Das 2020-05-26 11:29:52 UTC

PR: https://github.com/gluster/gluster-ansible/pull/108

Comment 4 SATHEESARAN 2020-05-27 05:49:23 UTC

(In reply to Gobinda Das from comment #2)
> PR: https://github.com/gluster/gluster-ansible/pull/108

Gobinda,

Lets track the replace-host playbook with this bug - https://bugzilla.redhat.com/show_bug.cgi?id=1641431

This bug is particularly to fix gluster-ansible-maintenance task that does volume restoration for
replace-host workflow

Comment 5 SATHEESARAN 2020-05-27 08:11:39 UTC

*** Bug 1840540 has been marked as a duplicate of this bug. ***

Comment 6 Gobinda Das 2020-05-28 08:25:27 UTC

PR: https://github.com/gluster/gluster-ansible-maintenance/pull/9

Comment 8 SATHEESARAN 2020-06-08 10:44:03 UTC

Tested with RHV 4.4.1 and gluster-ansible-maintenance-1.0.3.el8rhgs

There are 2 failures

1. Syntax error related to unremoved "-block" in the role
2. Semantic error, where the 'gluster_ansible_cluster_node' is used to replace host workflow

Comment 9 Prajith 2020-06-08 11:04:50 UTC

PR:- https://github.com/gluster/gluster-ansible-maintenance/pull/14

Comment 10 SATHEESARAN 2020-06-11 18:42:45 UTC

Verified with gluster-ansible-maintenance-1.0.1-4.el8rhgs

1. Once the host is replaced by the reinstalled version of host with same FQDN,
replace-brick worked good and that replace-brick command was successful

2. Healing got triggered post this operation

Comment 12 errata-xmlrpc 2020-06-16 05:57:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2575

Note You need to log in before you can comment on or make changes to this bug.