Bug 1541122

Summary: Improve geo-rep pre-validation logs
Product: Red Hat Gluster Storage Reporter: Dana Lane <dlane>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: urgent    
Version: rhhi-1.1CC: amukherj, annair, ascerra, csaba, dfitzpat, dlane, khiremat, rallan, rcyriac, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vdas
Target Milestone: ---Keywords: Reopened
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:42:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1503137    
Attachments:
Description Flags
Slave.log from the SOURCE system none

Description Dana Lane 2018-02-01 18:36:08 UTC
Description of problem: Starting with 2 RHHI pods, I attempted to follow the documentaton to configure geo-replication and get the following error; Unable to fetch slave volume details. Please check the slave cluster and slave volume.

The exact steps followed starting with Maintaining Redhat Hyperconverged Infrastructure in section 3.1 Configuring geo-replication for disaster recovery;

1 - [On pod I want to replicate FROM] #gluster volume set all cluster.enable-shared-storage enable
2 - [On the pod I want to replicate TO] #gluster volume set data features.shard enable  (Where data is the name of the destination volume)
3 - Pointed to the gluster documentaton, section 10.3.4.1 Setting up your environment for Geo-replication sessoin
4 - [On pod I want to replicate FROM] #gluster system:: execute gsec_create
5 - [On pod I want to replicate FROM] #gluster volume geo-replication data 192.168.50.36::data create push-pem  (The volume name on both the source and target systems is called 'data' and 192.168.50.36 is the IP of the master node on the target pod.) 
6 - Resulting error; 
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Passwordless ssh is configured.

Version-Release number of selected component (if applicable):


How reproducible:
I've run these commands exactly multiple times in either direction and they both fail the same way. 100% repeatable in my configuration


Steps to Reproduce:
(See above)

Actual results:
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Expected results:
Successfully create a geo-replication session

Additional info:

Comment 2 Atin Mukherjee 2018-02-05 14:19:35 UTC
Anoop - It's better to have this mentioned as [CSS] followed by the title. This is a similar method what GSS group follows as well. The reason I say this because people might consider CCS as a term to the problem title itself?

Comment 3 Kotresh HR 2018-02-08 10:21:40 UTC
Hi,

That just indicates the master node could mount the slave volume. We need the following log file to find out what exactly is the issue.

/var/log/glusterfs/geo-replicatoin-slaves/slave.log

I agree that cli output should have mentioned you about this log file. This improvement is already merged upstream [1]. With the patch, it correctly displays the log file to looked for. 

https://review.gluster.org/#/c/19242/

Comment 4 Dana Lane 2018-02-08 15:33:37 UTC
Created attachment 1393239 [details]
Slave.log from the SOURCE system

Attaching the slave.log from the source system, the system I'm attempting to start the geo-replication process from

Comment 10 Dana Lane 2018-02-16 16:24:58 UTC
Log attached previously

Comment 12 Dana Lane 2018-02-23 16:10:35 UTC
Following a re-install of the destination pod, we were able to successfully create and start geo-replication.

We'd still like to know what was the root cause of this.

Comment 14 Dana Lane 2018-02-26 14:17:52 UTC
Not sure why this was flagged as needs info. What information are you looking for?

Comment 17 Rahul Hinduja 2018-05-06 10:16:46 UTC
Verified the bug against log improvement to show the log location incase of wrong slave volume or stopped slave volume. 

Use Case 1: Wrong Slave volume

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave1 create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 


3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave1 create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Use Case 2: Slave volume is stopped

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 

3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Additionally the patch also brings clarity in log locations:

3.3.1 => Log location which points to these errors:

/var/log/glusterfs/geo-replication-slaves/slave.log

3.4 => Specific logs representing master and slave

[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication/
gverify-mastermnt.log  gverify-slavemnt.log  master
[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication-slaves/
mbr
[root@dhcp42-53 ~]#

Moving this bug to verified state against the fix. Any other enhancements to the log will be tracked in separate bug.

Comment 18 Dana Lane 2018-05-21 18:23:08 UTC
We no longer see this issue, closing this bug.

Comment 22 errata-xmlrpc 2018-09-04 06:42:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607