980117 – Dist-geo-rep: Geo-rep status of replica pairs goes to faulty intermittently.

Bug 980117 - Dist-geo-rep: Geo-rep status of replica pairs goes to faulty intermittently.

Summary: Dist-geo-rep: Geo-rep status of replica pairs goes to faulty intermittently.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Venky Shankar
QA Contact:	Vijaykumar Koppad
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	980734 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-01 13:07 UTC by Vijaykumar Koppad
Modified:	2014-08-25 00:50 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0.15rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:38:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vijaykumar Koppad 2013-07-01 13:07:49 UTC

Description of problem: If you start a geo-rep session with dist-rep master  volume, intermittently the status of the other replica pair goes to faulty from where there is no syncing happening, which is the gsync which is idle. This happens intermittently.

This is the excerpt from the idle gsync  log file, 

[2013-07-01 18:24:03.26717] D [master(/bricks/brick2):757:volinfo_state_machine] <top>: (None, f92305f7) << (None, f92305f7) -> (None, f92305f7)
[2013-07-01 18:24:06.538655] E [syncdutils(/bricks/brick2):189:log_raise_exception] <top>: connection to peer is broken
[2013-07-01 18:24:06.541389] E [syncdutils(/bricks/brick2):206:log_raise_exception] <top>: FULL EXCEPTION TRACE: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 232, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 157, in listen
    rid, exc, res = recv(self.inf)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 48, in recv
    return pickle.load(inf)
EOFError
[2013-07-01 18:24:06.543339] I [syncdutils(/bricks/brick2):158:finalize] <top>: exiting.
[2013-07-01 18:24:06.551038] I [monitor(monitor):81:set_state] Monitor: new state: faulty,


Version-Release number of selected component (if applicable):glusterfs-3.4.0.12rhs.beta1-1.el6rhs.x86_64


How reproducible: Intermittent


Steps to Reproduce:
1.Create and start a  geo-rep session with dist-rep master and slave
2.Create lot of data on the master, like untar a kernel. 
3.Check the status of the geo-rep

Actual results: Sometime geo-rep rep status of the replica pairs goes to faulty


Expected results: Status should be stable.


Additional info:

Comment 2 Venky Shankar 2013-07-03 14:41:17 UTC

I have a fix for this. Will send out the patch soon.

Comment 3 Venky Shankar 2013-07-03 17:31:22 UTC

*** Bug 980734 has been marked as a duplicate of this bug. ***

Comment 5 Vijaykumar Koppad 2013-08-05 06:50:40 UTC

Verified on glusterfs-3.4.0.15rhs-1

Comment 6 Scott Haines 2013-09-23 22:38:41 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 7 Scott Haines 2013-09-23 22:41:28 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.