Bug 1475475

Summary: [geo-rep]: Improve the output message to reflect the real failure with schedule_georep script
Product: Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: csaba, khiremat, rallan, rhs-bugs, sheggodu, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1499159 (view as bug list) Environment:
Last Closed: 2018-09-04 06:34:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1499159, 1499392, 1503134    

Description Rahul Hinduja 2017-07-26 18:31:18 UTC
Description of problem:
=======================

Currently if we manually check the geo-rep status or stop it with "invalid slave host, or slave volume". It throws right warning as:

[root@dhcp42-79 MASTER]# gluster volume geo-replication MASTER 10.70.41.209::SLAV status 
No active geo-replication sessions between MASTER and 10.70.41.209::SLAV
[root@dhcp42-79 MASTER]# gluster volume geo-replication MASTER 10.70.41.209::SLAV stop
Geo-replication session between MASTER and 10.70.41.209::SLAV does not exist.
geo-replication command failed
[root@dhcp42-79 MASTER]#

But if schedule_georep script is passed with invalid slave host and volume information it fails with "commit failed on localhost" as:

[root@dhcp42-79 MASTER]# time python /usr/share/glusterfs/scripts/schedule_georep.py MASTER 10.70.41.29 SLAVE
[NOT OK] 
Commit failed on localhost. Please check the log file for more details.


The problem with above output is it doesnt give picture whether something is down at slave (gsyncd, slave volume) or wrong slave information is provided. Also, which logs should user look into?

If geo-replication stop/status has failed, it should print the similar messages as it prints when executed manually. 

Version-Release number of selected component (if applicable):
=============================================================

glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64

Comment 3 Kotresh HR 2017-10-06 09:48:08 UTC
Upstream Patch:

https://review.gluster.org/18442 (master)

Comment 6 Rahul Hinduja 2018-04-18 08:49:16 UTC
Verified with build: glusterfs-geo-replication-3.12.2-7.el7rhgs.x86_64

If incorrect master volume information is provided it fails and prints the correct output as: 

[root@dhcp41-226 scripts]# python /usr/share/glusterfs/scripts/schedule_georep.py MASTER 10.70.41.29 SLAVE
[NOT OK] 
Volume MASTER does not exist

[root@dhcp41-226 scripts]# 


If incorrect slave host or slave volume is provided it fails and prints the correct output as: 

[root@dhcp41-226 scripts]# python /usr/share/glusterfs/scripts/schedule_georep.py vol0 10.70.41.29 vol1
[NOT OK] 
No active geo-replication sessions between vol0 and 10.70.41.29::vol1
geo-replication command failed

[root@dhcp41-226 scripts]# python /usr/share/glusterfs/scripts/schedule_georep.py vol0 10.70.42.9 vol2
[NOT OK] 
No active geo-replication sessions between vol0 and 10.70.42.9::vol2
geo-replication command failed

[root@dhcp41-226 scripts]# 

This fix addresses the concerns raised with incorrect master volume, slave host or slave volume. Moving this bug to verified state.

Comment 7 errata-xmlrpc 2018-09-04 06:34:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607