Description of problem:If you start geo-rep session , the status says faulty. And log says "connection to peer is broken". If you run gsyncd binary by hand , ie running /usr/local/libexec/gsyncd , it says "Segmentation fault (core dumped)", but no core will be dumped. Version-Release number of selected component (if applicable): Glusterfs-3.3 maser [bfac66f129646bc78f1ed3a7dccb3010114e57aa] How reproducible:Consistently Steps to Reproduce: 1.start a geo-rep session b/w master and slave 2.Check the status. Actual results:The status is faulty Expected results:THe status should be ok. Additional info: Logs- [2012-08-07 19:19:59.37766] I [syncdutils:148:finalize] <top>: exiting. [2012-08-07 19:20:09.49307] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------ [2012-08-07 19:20:09.49647] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker [2012-08-07 19:20:09.93501] I [gsyncd:388:main_i] <top>: syncing: gluster://localhost:master -> file:///root/geo [2012-08-07 19:20:09.115250] E [syncdutils:179:log_raise_exception] <top>: connection to peer is broken [2012-08-07 19:20:09.115509] E [resource:191:errlog] Popen: command "/usr/local/libexec/glusterfs/gsyncd --session-owner a9afadf5-c9d1-452c-883b-fd16f9f7a686 -N --listen --timeout 120 file:///root/geo" returned with -11
Initial analysis: Gsyncd slave process experienced problems while starting. The gsyncd wrapper (binary) relies on GF_* macros. Here is the bt (in ascending order of frame number) by attaching gdb to the gsyncd wrapper. ---------------------------------------------------------------------------- function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442 #24358 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124 #24359 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442 #24360 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124 #24361 0x00007fbdbddee4cc in _gf_log (domain=0x7fbdbde4a10b "", file=0x7fbdbde4a299 "globals.c", function=0x7fbdbde4a3c0 "__glusterfs_this_location", line=124, level=GF_LOG_WARNING, fmt=0x7fbdbde4a2b4 "pthread setspecific failed") at logging.c:442 #24362 0x00007fbdbde21809 in __glusterfs_this_location () at globals.c:124 #24363 0x00007fbdbde1d09c in __gf_calloc (nmemb=64, size=8, type=82) at mem-pool.c:112 #24364 0x00007fbdbde373a6 in runinit (runner=0x7fffc82abf40) at run.c:54 #24365 0x00000000004017f4 in invoke_gsyncd (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:114 #24366 0x0000000000402312 in main (argc=8, argv=0x7fffc82ad0e8) at gsyncd.c:331 ------------------------------------------------------------------------------- There's a recursive sequence of _gf_log and __glusterfs_this_location. This is trigerred by __gf_calloc trying to access THIS. But THIS is not valid in gsyncd context. __glusterfs_this_location tries to log this information (using gf_log). gf_log again tries to access THIS. Hence the recursice sequence of these function calls. Looks like commit ed4b76ba introduced some references to THIS in __gf_calloc. As a quick workaround we could do -DRUN_STANDALONE while compiling gsyncd wrapper (Makefile changes). This would result in GF_CALLOC expand to calloc() and other GF_* to their relavent non glusterfs calls. Other fix would be to make THIS valid in gsyncd context; which would involved some initialization steps (similar to what is done in cli)
*** Bug 851951 has been marked as a duplicate of this bug. ***