Bug 1457976

Summary: Georeplication status goes faulty after reboot 1 source node
Product: [Community] GlusterFS Reporter: Mark <deligatedgeek>
Component: geo-replicationAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.8CC: avishwan, bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 10:40:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs from georeplication volume none

Description Mark 2017-06-01 16:23:26 UTC
Description of problem:

Environment 4 CentOS 7.2 servers, 2 in UK, and one each in New York and Sydney

A replica 2 Glusterfs src volume was created using 1 brick from each of 2 Uk servers, this worked great.

Geo-replication was configured from this source volume to a destination volume in NY and another in Sydney, this also worked great.

Output showed first src node as Active and second src node as Passive

The first src nodes was shutdown and after a short time the second src node became Active, replication continued.

The first src node was started and added its brick back into the src vol, but the geo-replication status for the first node became Faulty, the second node was Passive.  Upon shutting the first node down the second node geo-replication status became Active, but when started the status became faulty.

I searched the error and found that it may be related to the index being rotated and thus the first node had lost track, but no instructions on how to fix this.

I had to delete the geo-replication and the destination node, then recreate both to fix the issue.

Version-Release number of selected component (if applicable):
3.8.5

How reproducible:
twice so far

Steps to Reproduce:
1.create above environment 
2.shutdown one src node
3.start the src node

Actual results:
geo-replication status goes faulty


Expected results:
src node goes Active and replication continues

Additional info:

Comment 1 Aravinda VK 2017-06-12 06:11:26 UTC
Please upload the Geo-rep logs of Faulty node from /var/log/glusterfs/geo-replication directory

Comment 2 Mark 2017-06-12 09:18:01 UTC
Created attachment 1286989 [details]
Logs from georeplication volume

This file contains the logs from the georeplication directory for this volume

Comment 3 Mark 2017-06-12 09:21:03 UTC
I know the resolution make take time, is there a workaround if this occurs again?

Comment 4 Niels de Vos 2017-11-07 10:40:08 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.