Bug 989378 - Dist-geo-rep : data is not synced to slave (more than 2 days), it keeps doing - starting gsyncd worker, xsync crawl and before data is synced worker died
Dist-geo-rep : data is not synced to slave (more than 2 days), it keeps doin...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
shilpa
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-29 03:40 EDT by Rachana Patel
Modified: 2016-06-23 05:51 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-06 10:10:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rachana Patel 2013-07-29 03:40:16 EDT
Description of problem:
Dist-geo-rep : data is not synced to slave (more than 2 days), it keeps  doing - starting gsyncd worker, xsync crawl and before data is synced worker died 

Version-Release number of selected component (if applicable):
3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
[root@DVM4 ssh%3A%2F%2Froot%4010.70.36.248%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# gluster volume geo master2 rhsauto018.lab.eng.blr.redhat.com::slave2 statusNODE                           MASTER     SLAVE                                        HEALTH    UPTIME            
-------------------------------------------------------------------------------------------------------------------
DVM4.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    08:17:45          
DVM1.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    08:17:00          
DVM2.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    2 days 22:30:07   
DVM5.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    2 days 22:30:07   
DVM6.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    08:17:31          
DVM3.lab.eng.blr.redhat.com    master2    rhsauto018.lab.eng.blr.redhat.com::slave2    Stable    2 days 22:30:07

nothing is synced

log:
[2013-07-28 11:43:39.891654] E [syncdutils(/rhs/brick2):189:log_raise_exception] <top>: connection to peer is broken
[2013-07-28 11:43:39.916518] E [syncdutils(/rhs/brick2):189:log_raise_exception] <top>: connection to peer is broken
[2013-07-28 11:43:39.918039] E [syncdutils(/rhs/brick2):189:log_raise_exception] <top>: connection to peer is broken
[2013-07-28 11:43:39.965405] I [syncdutils(/rhs/brick2):158:finalize] <top>: exiting.
[2013-07-28 11:43:40.497952] I [monitor(monitor):81:set_state] Monitor: new state: faulty
[2013-07-28 11:43:50.677898] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-07-28 11:43:50.678414] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-07-28 11:43:52.324848] I [gsyncd(/rhs/brick2):501:main_i] <top>: syncing: gluster://localhost:master2 -> ssh://root@rhsauto018.lab.eng.blr.redhat.com:gluster://localhost:slave2
[2013-07-28 11:43:55.441731] I [master(/rhs/brick2):60:gmaster_builder] <top>: setting up xsync change detection mode
[2013-07-28 11:43:55.448071] I [master(/rhs/brick2):60:gmaster_builder] <top>: setting up changelog change detection mode
[2013-07-28 11:43:55.452831] I [master(/rhs/brick2):980:register] _GMaster: xsync temp directory: /var/run/gluster/master2/ssh%3A%2F%2Froot%4010.70.36.248%3Agluster%3A%2F%2F127.0.0.1%3Aslave2/9a61fa9bb01f4231281842deec4b3b03/xsync


Actual results:


Expected results:


Additional info:


[root@rhsauto018 ~]# gluster v info slave2
 
Volume Name: slave2
Type: Distribute
Volume ID: a2dcf3f2-1526-4258-ab4a-6894db73a9fd
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick1
Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick1
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1
Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2



[root@DVM4 ssh%3A%2F%2Froot%4010.70.36.248%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# gluster v info master2
 
Volume Name: master2
Type: Distributed-Replicate
Volume ID: a55594f5-724c-435b-a514-0eb6b7af6b77
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.37.47:/rhs/brick2
Brick2: 10.70.37.62:/rhs/brick2
Brick3: 10.70.37.100:/rhs/brick2
Brick4: 10.70.37.95:/rhs/brick2
Brick5: 10.70.37.85:/rhs/brick2
Brick6: 10.70.37.141:/rhs/brick2
Options Reconfigured:
changelog.fsync-interval: 3
changelog.rollover-time: 15
changelog.encoding: ascii
geo-replication.indexing: on
Comment 5 Amar Tumballi 2013-09-11 09:38:25 EDT
Is it about preventing the master from getting to corrupt state?
Comment 6 Amar Tumballi 2013-11-02 06:01:53 EDT
there are similar issues fixed with latest RHS builds. Can we verify this is not happening now?

I feel the patch https://code.engineering.redhat.com/gerrit/14100 should fix it...
Comment 8 shilpa 2015-01-29 00:55:54 EST
I don't see this issue occuring on the recent builds. Can be closed.
Comment 10 Aravinda VK 2015-08-06 10:10:05 EDT
This issue was observed only during initial builds of RHS 2.1, Issue is resolved as per the comment 6. This issue was not observed again in latest version of RHGS 2.1, 3.0.4 and 3.1. Closing this bug. Please reopen if this issue found again.

Note You need to log in before you can comment on or make changes to this bug.