Bug 846569 - geo-rep session goes to faulty with logging "connection to peer is broken"
geo-rep session goes to faulty with logging "connection to peer is broken"
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
mainline
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Venky Shankar
: Triaged
: 851951 (view as bug list)
Depends On:
Blocks: 849304
  Show dependency treegraph
 
Reported: 2012-08-08 03:11 EDT by Vijaykumar Koppad
Modified: 2014-08-24 20:49 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849304 (view as bug list)
Environment:
Last Closed: 2013-07-24 13:57:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vijaykumar Koppad 2012-08-08 03:11:33 EDT
Description of problem:If you start geo-rep session , the status says faulty. 
And log says  "connection to peer is broken". If you run gsyncd binary by hand , 
ie running /usr/local/libexec/gsyncd , it says 
"Segmentation fault (core dumped)", but no core will be dumped. 

Version-Release number of selected component (if applicable): Glusterfs-3.3 maser  [bfac66f129646bc78f1ed3a7dccb3010114e57aa]



How reproducible:Consistently 


Steps to Reproduce:
1.start a geo-rep session b/w master and slave 
2.Check the status. 

  
Actual results:The status is faulty 


Expected results:THe status should be ok. 


Additional info:
Logs- 

[2012-08-07 19:19:59.37766] I [syncdutils:148:finalize] <top>: exiting.
[2012-08-07 19:20:09.49307] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------
[2012-08-07 19:20:09.49647] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
[2012-08-07 19:20:09.93501] I [gsyncd:388:main_i] <top>: syncing: gluster://localhost:master -> file:///root/geo
[2012-08-07 19:20:09.115250] E [syncdutils:179:log_raise_exception] <top>: connection to peer is broken
[2012-08-07 19:20:09.115509] E [resource:191:errlog] Popen: command "/usr/local/libexec/glusterfs/gsyncd --session-owner a9afadf5-c9d1-452c-883b-fd16f9f7a686 -N --listen --timeout 120 file:///root/geo" returned with -11
Comment 1 Venky Shankar 2012-08-09 01:10:20 EDT
Initial analysis:

Gsyncd slave process experienced problems while starting. The gsyncd wrapper (binary) relies on GF_* macros. Here is the bt (in ascending order of frame number) by attaching gdb to the gsyncd wrapper.

----------------------------------------------------------------------------
function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24358 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24359 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24360 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24361 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24362 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24363 0x00007fbdbde1d09c in __gf_calloc (nmemb=64, size=8, type=82) at mem-pool.c:112
#24364 0x00007fbdbde373a6 in runinit (runner=0x7fffc82abf40) at run.c:54
#24365 0x00000000004017f4 in invoke_gsyncd (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:114
#24366 0x0000000000402312 in main (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:331

-------------------------------------------------------------------------------

There's a recursive sequence of _gf_log and __glusterfs_this_location. This is trigerred by __gf_calloc trying to access THIS. But THIS is not valid in gsyncd context. __glusterfs_this_location tries to log this information (using gf_log). gf_log again tries to access THIS. Hence the recursice sequence of these function calls.

Looks like commit ed4b76ba introduced some references to THIS in __gf_calloc.

As a quick workaround we could do -DRUN_STANDALONE while compiling gsyncd wrapper (Makefile changes). This would result in GF_CALLOC expand to calloc() and other GF_* to their relavent non glusterfs calls.

Other fix would be to make THIS valid in gsyncd context; which would involved some initialization steps (similar to what is done in cli)
Comment 2 Csaba Henk 2012-08-27 10:20:26 EDT
*** Bug 851951 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.