Bug 849304 - geo-rep session goes to faulty with logging "connection to peer is broken"
geo-rep session goes to faulty with logging "connection to peer is broken"
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
2.0
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Venky Shankar
Sudhir D
:
Depends On: 846569
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-17 22:40 EDT by Vidya Sakar
Modified: 2013-03-03 21:06 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 846569
Environment:
Last Closed: 2012-10-17 08:05:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-08-17 22:40:21 EDT
+++ This bug was initially created as a clone of Bug #846569 +++

Description of problem:If you start geo-rep session , the status says faulty. 
And log says  "connection to peer is broken". If you run gsyncd binary by hand , 
ie running /usr/local/libexec/gsyncd , it says 
"Segmentation fault (core dumped)", but no core will be dumped. 

Version-Release number of selected component (if applicable): Glusterfs-3.3 maser  [bfac66f129646bc78f1ed3a7dccb3010114e57aa]



How reproducible:Consistently 


Steps to Reproduce:
1.start a geo-rep session b/w master and slave 
2.Check the status. 

  
Actual results:The status is faulty 


Expected results:THe status should be ok. 


Additional info:
Logs- 

[2012-08-07 19:19:59.37766] I [syncdutils:148:finalize] <top>: exiting.
[2012-08-07 19:20:09.49307] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------
[2012-08-07 19:20:09.49647] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
[2012-08-07 19:20:09.93501] I [gsyncd:388:main_i] <top>: syncing: gluster://localhost:master -> file:///root/geo
[2012-08-07 19:20:09.115250] E [syncdutils:179:log_raise_exception] <top>: connection to peer is broken
[2012-08-07 19:20:09.115509] E [resource:191:errlog] Popen: command "/usr/local/libexec/glusterfs/gsyncd --session-owner a9afadf5-c9d1-452c-883b-fd16f9f7a686 -N --listen --timeout 120 file:///root/geo" returned with -11

--- Additional comment from vshankar@redhat.com on 2012-08-09 01:10:20 EDT ---

Initial analysis:

Gsyncd slave process experienced problems while starting. The gsyncd wrapper (binary) relies on GF_* macros. Here is the bt (in ascending order of frame number) by attaching gdb to the gsyncd wrapper.

----------------------------------------------------------------------------
function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24358 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24359 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24360 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24361 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, 
    fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442
#24362 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124
#24363 0x00007fbdbde1d09c in __gf_calloc (nmemb=64, size=8, type=82) at mem-pool.c:112
#24364 0x00007fbdbde373a6 in runinit (runner=0x7fffc82abf40) at run.c:54
#24365 0x00000000004017f4 in invoke_gsyncd (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:114
#24366 0x0000000000402312 in main (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:331

-------------------------------------------------------------------------------

There's a recursive sequence of _gf_log and __glusterfs_this_location. This is trigerred by __gf_calloc trying to access THIS. But THIS is not valid in gsyncd context. __glusterfs_this_location tries to log this information (using gf_log). gf_log again tries to access THIS. Hence the recursice sequence of these function calls.

Looks like commit ed4b76ba introduced some references to THIS in __gf_calloc.

As a quick workaround we could do -DRUN_STANDALONE while compiling gsyncd wrapper (Makefile changes). This would result in GF_CALLOC expand to calloc() and other GF_* to their relavent non glusterfs calls.

Other fix would be to make THIS valid in gsyncd context; which would involved some initialization steps (similar to what is done in cli)

Note You need to log in before you can comment on or make changes to this bug.