Bug 1541122

Summary:

Improve geo-rep pre-validation logs

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Dana Lane <dlane>

Component:

geo-replication

Assignee:

Kotresh HR <khiremat>

Status:

CLOSED ERRATA

QA Contact:

Rahul Hinduja <rhinduja>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

rhhi-1.1

CC:

amukherj, annair, ascerra, csaba, dfitzpat, dlane, khiremat, rallan, rcyriac, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, vdas

Target Milestone:

---

Keywords:

Reopened

Target Release:

RHGS 3.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glusterfs-3.12.2-5

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-04 06:42:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1503137

Attachments:

Description	Flags
Slave.log from the SOURCE system	none

Description Dana Lane 2018-02-01 18:36:08 UTC

Description of problem: Starting with 2 RHHI pods, I attempted to follow the documentaton to configure geo-replication and get the following error; Unable to fetch slave volume details. Please check the slave cluster and slave volume.

The exact steps followed starting with Maintaining Redhat Hyperconverged Infrastructure in section 3.1 Configuring geo-replication for disaster recovery;

1 - [On pod I want to replicate FROM] #gluster volume set all cluster.enable-shared-storage enable
2 - [On the pod I want to replicate TO] #gluster volume set data features.shard enable  (Where data is the name of the destination volume)
3 - Pointed to the gluster documentaton, section 10.3.4.1 Setting up your environment for Geo-replication sessoin
4 - [On pod I want to replicate FROM] #gluster system:: execute gsec_create
5 - [On pod I want to replicate FROM] #gluster volume geo-replication data 192.168.50.36::data create push-pem  (The volume name on both the source and target systems is called 'data' and 192.168.50.36 is the IP of the master node on the target pod.) 
6 - Resulting error; 
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Passwordless ssh is configured.

Version-Release number of selected component (if applicable):


How reproducible:
I've run these commands exactly multiple times in either direction and they both fail the same way. 100% repeatable in my configuration


Steps to Reproduce:
(See above)

Actual results:
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed


Expected results:
Successfully create a geo-replication session

Additional info:

Comment 2 Atin Mukherjee 2018-02-05 14:19:35 UTC

Anoop - It's better to have this mentioned as [CSS] followed by the title. This is a similar method what GSS group follows as well. The reason I say this because people might consider CCS as a term to the problem title itself?

Comment 3 Kotresh HR 2018-02-08 10:21:40 UTC

Hi,

That just indicates the master node could mount the slave volume. We need the following log file to find out what exactly is the issue.

/var/log/glusterfs/geo-replicatoin-slaves/slave.log

I agree that cli output should have mentioned you about this log file. This improvement is already merged upstream [1]. With the patch, it correctly displays the log file to looked for. 

https://review.gluster.org/#/c/19242/

Comment 4 Dana Lane 2018-02-08 15:33:37 UTC

Created attachment 1393239 [details]
Slave.log from the SOURCE system

Attaching the slave.log from the source system, the system I'm attempting to start the geo-replication process from

Comment 10 Dana Lane 2018-02-16 16:24:58 UTC

Log attached previously

Comment 12 Dana Lane 2018-02-23 16:10:35 UTC

Following a re-install of the destination pod, we were able to successfully create and start geo-replication.

We'd still like to know what was the root cause of this.

Comment 14 Dana Lane 2018-02-26 14:17:52 UTC

Not sure why this was flagged as needs info. What information are you looking for?

Comment 17 Rahul Hinduja 2018-05-06 10:16:46 UTC

Verified the bug against log improvement to show the log location incase of wrong slave volume or stopped slave volume. 

Use Case 1: Wrong Slave volume

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave1 create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 


3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave1 create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Use Case 2: Slave volume is stopped

3.3.1

[root@dhcp47-167 ~]# gluster volume geo-replication master 10.70.47.17::slave create push-pem
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
[root@dhcp47-167 ~]# 

3.4

[root@dhcp42-53 ~]# gluster volume geo-replication master 10.70.41.221::slave create push-pem
Unable to mount and fetch slave volume details. Please check the log: /var/log/glusterfs/geo-replication/gverify-slavemnt.log
geo-replication command failed
[root@dhcp42-53 ~]# 


Additionally the patch also brings clarity in log locations:

3.3.1 => Log location which points to these errors:

/var/log/glusterfs/geo-replication-slaves/slave.log

3.4 => Specific logs representing master and slave

[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication/
gverify-mastermnt.log  gverify-slavemnt.log  master
[root@dhcp42-53 ~]# ls /var/log/glusterfs/geo-replication-slaves/
mbr
[root@dhcp42-53 ~]#

Moving this bug to verified state against the fix. Any other enhancements to the log will be tracked in separate bug.

Comment 18 Dana Lane 2018-05-21 18:23:08 UTC

We no longer see this issue, closing this bug.

Comment 22 errata-xmlrpc 2018-09-04 06:42:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607