Bug 980117 - Dist-geo-rep: Geo-rep status of replica pairs goes to faulty intermittently.
Dist-geo-rep: Geo-rep status of replica pairs goes to faulty intermittently.
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Venky Shankar
Vijaykumar Koppad
:
: 980734 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-01 09:07 EDT by Vijaykumar Koppad
Modified: 2014-08-24 20:50 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.15rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:38:41 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2013-07-01 09:07:49 EDT
Description of problem: If you start a geo-rep session with dist-rep master  volume, intermittently the status of the other replica pair goes to faulty from where there is no syncing happening, which is the gsync which is idle. This happens intermittently.

This is the excerpt from the idle gsync  log file, 

[2013-07-01 18:24:03.26717] D [master(/bricks/brick2):757:volinfo_state_machine] <top>: (None, f92305f7) << (None, f92305f7) -> (None, f92305f7)
[2013-07-01 18:24:06.538655] E [syncdutils(/bricks/brick2):189:log_raise_exception] <top>: connection to peer is broken
[2013-07-01 18:24:06.541389] E [syncdutils(/bricks/brick2):206:log_raise_exception] <top>: FULL EXCEPTION TRACE: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 232, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 157, in listen
    rid, exc, res = recv(self.inf)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 48, in recv
    return pickle.load(inf)
EOFError
[2013-07-01 18:24:06.543339] I [syncdutils(/bricks/brick2):158:finalize] <top>: exiting.
[2013-07-01 18:24:06.551038] I [monitor(monitor):81:set_state] Monitor: new state: faulty,


Version-Release number of selected component (if applicable):glusterfs-3.4.0.12rhs.beta1-1.el6rhs.x86_64


How reproducible: Intermittent


Steps to Reproduce:
1.Create and start a  geo-rep session with dist-rep master and slave
2.Create lot of data on the master, like untar a kernel. 
3.Check the status of the geo-rep

Actual results: Sometime geo-rep rep status of the replica pairs goes to faulty


Expected results: Status should be stable.


Additional info:
Comment 2 Venky Shankar 2013-07-03 10:41:17 EDT
I have a fix for this. Will send out the patch soon.
Comment 3 Venky Shankar 2013-07-03 13:31:22 EDT
*** Bug 980734 has been marked as a duplicate of this bug. ***
Comment 5 Vijaykumar Koppad 2013-08-05 02:50:40 EDT
Verified on glusterfs-3.4.0.15rhs-1
Comment 6 Scott Haines 2013-09-23 18:38:41 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 7 Scott Haines 2013-09-23 18:41:28 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.