1457976 – Georeplication status goes faulty after reboot 1 source node

Bug 1457976 - Georeplication status goes faulty after reboot 1 source node

Summary: Georeplication status goes faulty after reboot 1 source node

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-01 16:23 UTC by Mark
Modified:	2017-11-07 10:40 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:40:08 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Logs from georeplication volume (488.39 KB, application/octet-stream) 2017-06-12 09:18 UTC, Mark	no flags	Details
View All

Description Mark 2017-06-01 16:23:26 UTC

Description of problem:

Environment 4 CentOS 7.2 servers, 2 in UK, and one each in New York and Sydney

A replica 2 Glusterfs src volume was created using 1 brick from each of 2 Uk servers, this worked great.

Geo-replication was configured from this source volume to a destination volume in NY and another in Sydney, this also worked great.

Output showed first src node as Active and second src node as Passive

The first src nodes was shutdown and after a short time the second src node became Active, replication continued.

The first src node was started and added its brick back into the src vol, but the geo-replication status for the first node became Faulty, the second node was Passive.  Upon shutting the first node down the second node geo-replication status became Active, but when started the status became faulty.

I searched the error and found that it may be related to the index being rotated and thus the first node had lost track, but no instructions on how to fix this.

I had to delete the geo-replication and the destination node, then recreate both to fix the issue.

Version-Release number of selected component (if applicable):
3.8.5

How reproducible:
twice so far

Steps to Reproduce:
1.create above environment 
2.shutdown one src node
3.start the src node

Actual results:
geo-replication status goes faulty


Expected results:
src node goes Active and replication continues

Additional info:

Comment 1 Aravinda VK 2017-06-12 06:11:26 UTC

Please upload the Geo-rep logs of Faulty node from /var/log/glusterfs/geo-replication directory

Comment 2 Mark 2017-06-12 09:18:01 UTC

Created attachment 1286989 [details]
Logs from georeplication volume

This file contains the logs from the georeplication directory for this volume

Comment 3 Mark 2017-06-12 09:21:03 UTC

I know the resolution make take time, is there a workaround if this occurs again?

Comment 4 Niels de Vos 2017-11-07 10:40:08 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.