1541122 – Improve geo-rep pre-validation logs

Bug 1541122 - Improve geo-rep pre-validation logs

Summary: Improve geo-rep pre-validation logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhhi-1.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-02-01 18:36 UTC by Dana Lane
Modified:	2018-09-14 06:13 UTC (History)
CC List:	14 users (show)
Fixed In Version:	glusterfs-3.12.2-5
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:42:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Slave.log from the SOURCE system (316.23 KB, text/plain) 2018-02-08 15:33 UTC, Dana Lane	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:43:39 UTC

Description Dana Lane 2018-02-01 18:36:08 UTC

Description of problem: Starting with 2 RHHI pods, I attempted to follow the documentaton to configure geo-replication and get the following error; Unable to fetch slave volume details. Please check the slave cluster and slave volume.

The exact steps followed starting with Maintaining Redhat Hyperconverged Infrastructure in section 3.1 Configuring geo-replication for disaster recovery;

1 - [On pod I want to replicate FROM] #gluster volume set all cluster.enable-shared-storage enable
2 - [On the pod I want to replicate TO] #gluster volume set data features.shard enable  (Where data is the name of the destination volume)
3 - Pointed to the gluster documentaton, section 10.3.4.1 Setting up your environment for Geo-replication sessoin
4 - [On pod I want to replicate FROM] #gluster system:: execute gsec_create
5 - [On pod I want to replicate FROM] #gluster volume geo-replication data 192.168.50.36::data create push-pem  (The volume name on both the source and target systems is called 'data' and 192.168.50.36 is the IP of the master node on the target pod.) 
6 - Resulting error; 
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Passwordless ssh is configured.

Version-Release number of selected component (if applicable):


How reproducible:
I've run these commands exactly multiple times in either direction and they both fail the same way. 100% repeatable in my configuration


Steps to Reproduce:
(See above)

Actual results:
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Expected results:
Successfully create a geo-replication session

Additional info:

Comment 2 Atin Mukherjee 2018-02-05 14:19:35 UTC

Anoop - It's better to have this mentioned as [CSS] followed by the title. This is a similar method what GSS group follows as well. The reason I say this because people might consider CCS as a term to the problem title itself?

Comment 3 Kotresh HR 2018-02-08 10:21:40 UTC

Hi,

That just indicates the master node could mount the slave volume. We need the following log file to find out what exactly is the issue.

/var/log/glusterfs/geo-replicatoin-slaves/slave.log

I agree that cli output should have mentioned you about this log file. This improvement is already merged upstream [1]. With the patch, it correctly displays the log file to looked for. 

https://review.gluster.org/#/c/19242/

Comment 4 Dana Lane 2018-02-08 15:33:37 UTC

Created attachment 1393239 [details]
Slave.log from the SOURCE system

Attaching the slave.log from the source system, the system I'm attempting to start the geo-replication process from

Comment 10 Dana Lane 2018-02-16 16:24:58 UTC

Log attached previously

Comment 12 Dana Lane 2018-02-23 16:10:35 UTC

Following a re-install of the destination pod, we were able to successfully create and start geo-replication.

We'd still like to know what was the root cause of this.

Comment 14 Dana Lane 2018-02-26 14:17:52 UTC

Not sure why this was flagged as needs info. What information are you looking for?

Comment 17 Rahul Hinduja 2018-05-06 10:16:46 UTC

Verified the bug against log improvement to show the log location incase of wrong slave volume or stopped slave volume. 

Use Case 1: Wrong Slave volume

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave1 create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 


3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave1 create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Use Case 2: Slave volume is stopped

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 

3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Additionally the patch also brings clarity in log locations:

3.3.1 => Log location which points to these errors:

/var/log/glusterfs/geo-replication-slaves/slave.log

3.4 => Specific logs representing master and slave

[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication/
gverify-mastermnt.log  gverify-slavemnt.log  master
[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication-slaves/
mbr
[root@dhcp42-53 ~]#

Moving this bug to verified state against the fix. Any other enhancements to the log will be tracked in separate bug.

Comment 18 Dana Lane 2018-05-21 18:23:08 UTC

We no longer see this issue, closing this bug.

Comment 22 errata-xmlrpc 2018-09-04 06:42:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.