905871 – Geo-rep status says OK , doesn't sync even a single file from the master.

Bug 905871 - Geo-rep status says OK , doesn't sync even a single file from the master.

Summary: Geo-rep status says OK , doesn't sync even a single file from the master.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Venky Shankar
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	918917
TreeView+	depends on / blocked

Reported:	2013-01-30 11:06 UTC by paisat007
Modified:	2013-07-24 17:47 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0
Clone Of:
Environment:
Last Closed:	2013-07-24 17:47:13 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description paisat007 2013-01-30 11:06:26 UTC

Description of problem: Geo-rep doesnt sync any file even the status says OK. 
If you do getfattr on the client-pid=-1 , it hangs . But if you dd getfattr on the normal mount , it is working . And also gsyncd process goes into 'D' state .


Version-Release number of selected component (if applicable): master[c950d3f0e104fc7b78e493ad7ca0005a600b00f9r]


How reproducible: Consistently


Steps to Reproduce:
1. Start a geo-rep session between master and slave 
2. Put some data on the master mount point 
3. Check geo-rep status and also if data is synced on the slave 
  
Actual results: data is not syncing


Expected results: Data should sync


Additional info:

Comment 1 Venky Shankar 2013-01-31 09:12:20 UTC

I am unable to reproduce this on my setup. syncing is perfectly fine - no hangs in   {get,list}xattr.

But yes, I was a witness when this issue was experienced. A quick check on kernel stack trace for the master process showed it stuck on sys_listxattr() for long. And this too only for client-pid = -1, regular mounts functioned fine.

[This was reproduced in a VM setup, which i don't have access now]

Comment 2 Venky Shankar 2013-02-14 05:35:37 UTC

Closing this as this was probably a setup issue. I followed up with the reporter of this bug and got to know that geo-replication works perfectly fine in the setup now.

Comment 3 Venky Shankar 2013-03-21 08:18:09 UTC

this issue came up again in one of the setups. this time it was possible to get the backtrace of the client process:

gdb) bt
#0  0x00000033b94c1dc5 in internal_fnmatch (pattern=<optimized out>, string=string@entry=0x1a24710 "security.selinux", string_end=0x1a24720 "",
    no_leading_period=no_leading_period@entry=4, flags=flags@entry=4, ends=ends@entry=0x0, alloca_used=alloca_used@entry=0) at fnmatch_loop.c:183
#1  0x00000033b94c306e in __fnmatch (pattern=0x1a24710 "security.selinux", string=0x0, flags=4) at fnmatch.c:449
#2  0x00007f11b353ca3c in fuse_filter_xattr (key=0x1a24710 "security.selinux") at fuse-bridge.c:3015
#3  fuse_filter_xattr (key=0x1a24710 "security.selinux") at fuse-bridge.c:3009
#4  0x00007f11b51534e2 in dict_keys_join (value=value@entry=0x0, size=size@entry=0, dict=dict@entry=0x7f11b3942a90,
    filter_fn=filter_fn@entry=0x7f11b353ca00 <fuse_filter_xattr>) at dict.c:1183
#5  0x00007f11b35432ab in fuse_xattr_cbk (frame=0x7f11b3d7f148, cookie=<optimized out>, this=0x1937b30, op_ret=0, op_errno=0, dict=0x7f11b3942a90,
    xdata=0x0) at fuse-bridge.c:3064
#6  0x00007f11ab5cb43b in io_stats_getxattr_cbk (frame=0x7f11b3f8b02c, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0,
    dict=0x7f11b3942a90, xdata=0x0) at io-stats.c:1640
#7  0x00007f11ab7dcc06 in mdc_getxattr_cbk (frame=frame@entry=0x7f11b3f8b0d8, cookie=<optimized out>, this=<optimized out>, op_ret=0,
    op_errno=op_errno@entry=0, xattr=<optimized out>, xdata=xdata@entry=0x0) at md-cache.c:1658
#8  0x00007f11b0618eb2 in dht_getxattr_cbk (frame=0x7f11b3f8b184, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=0,
    xattr=<optimized out>, xdata=0x0) at dht-common.c:2041
#9  0x00007f11b0860363 in afr_getxattr_cbk (frame=0x7f11b3f8b230, cookie=<optimized out>, this=<optimized out>, op_ret=0, op_errno=0,
    dict=<optimized out>, xdata=0x0) at afr-inode-read.c:618
#10 0x00007f11b0acce1f in client3_3_getxattr_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f11b3f8b2dc)
    at client-rpc-fops.c:1115
#11 0x00007f11b4f36714 in rpc_clnt_handle_reply (clnt=clnt@entry=0x19ae0a0, pollin=0x1954d50) at rpc-clnt.c:771
#12 0x00007f11b4f36a7d in rpc_clnt_notify (trans=<optimized out>, mydata=0x19ae0d0, event=<optimized out>, data=<optimized out>) at rpc-clnt.c:890
#13 0x00007f11b4f332f3 in rpc_transport_notify (this=this@entry=0x19bdad0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=<optimized out>)
    at rpc-transport.c:495
#14 0x00007f11b1b25564 in socket_event_poll_in (this=this@entry=0x19bdad0) at socket.c:2118
#15 0x00007f11b1b25cdc in socket_event_handler (fd=<optimized out>, idx=<optimized out>, data=0x19bdad0, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:2230
#16 0x00007f11b5198d7a in event_dispatch_epoll_handler (i=<optimized out>, events=0x19536e0, event_pool=0x1936ea0) at event-epoll.c:384
#17 event_dispatch_epoll (event_pool=0x1936ea0) at event-epoll.c:445
#18 0x0000000000404926 in main (argc=5, argv=0x7fff14bf2dc8) at glusterfsd.c:1902 

... looks like it's stuck in fnmatch() for long.

Comment 4 Vijay Bellur 2013-03-26 12:29:25 UTC

REVIEW: http://review.gluster.org/4723 (libglusterfs/dict: fix infinite loop in dict_keys_join()) posted (#1) for review on master by Vijaykumar Koppad (vijaykumar.koppad)

Comment 5 Vijay Bellur 2013-03-26 12:40:02 UTC

REVIEW: http://review.gluster.org/4723 (libglusterfs/dict: fix infinite loop in dict_keys_join()) posted (#2) for review on master by Vijaykumar Koppad (vijaykumar.koppad)

Comment 6 Vijay Bellur 2013-03-27 10:44:49 UTC

REVIEW: http://review.gluster.org/4728 (libglusterfs/dict: fix infinite loop in dict_keys_join()) posted (#1) for review on release-3.4 by Vijaykumar Koppad (vkoppad)

Comment 7 Vijay Bellur 2013-03-27 18:03:59 UTC

COMMIT: http://review.gluster.org/4723 committed in master by Anand Avati (avati) 
------
commit 1f7dadccd45863ebea8f60339f297ac551e89899
Author: Vijaykumar koppad <vijaykumar.koppad>
Date:   Tue Mar 26 17:42:32 2013 +0530

    libglusterfs/dict: fix infinite loop in dict_keys_join()
    
    	 - missing "pairs = next" caused infinite loop
    
    Change-Id: I9171be5bec051de6095e135d616534ab49cd4797
    BUG: 905871
    Signed-off-by: Vijaykumar Koppad <vijaykumar.koppad>
    Reviewed-on: http://review.gluster.org/4723
    Reviewed-by: Venky Shankar <vshankar>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 8 Vijay Bellur 2013-03-27 18:07:17 UTC

COMMIT: http://review.gluster.org/4728 committed in release-3.4 by Anand Avati (avati) 
------
commit 1f7dadccd45863ebea8f60339f297ac551e89899
Author: Vijaykumar koppad <vijaykumar.koppad>
Date:   Tue Mar 26 17:42:32 2013 +0530

    libglusterfs/dict: fix infinite loop in dict_keys_join()
    
    	 - missing "pairs = next" caused infinite loop
    
    Change-Id: I9171be5bec051de6095e135d616534ab49cd4797
    BUG: 905871
    Signed-off-by: Vijaykumar Koppad <vijaykumar.koppad>
    Reviewed-on: http://review.gluster.org/4723
    Reviewed-by: Venky Shankar <vshankar>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Note You need to log in before you can comment on or make changes to this bug.