Bug 802403 - nfs-nlm: (crash)add-brick while locks are getting held
Summary: nfs-nlm: (crash)add-brick while locks are getting held
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: pre-release
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
: 804489 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-12 13:36 UTC by Saurabh
Modified: 2016-01-19 06:09 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:01:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa33
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2012-03-12 13:36:41 UTC
Description of problem:

Core was generated by `/root/330/inst/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glust'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
106		(head->next)->prev = (list->prev);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
#1  0x00007f5d0514541d in list_splice_init (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:130
#2  0x00007f5d051467ae in saved_frames_unwind (saved_frames=0x7f5ce00148a0) at rpc-clnt.c:360
#3  0x00007f5d05146aa3 in saved_frames_destroy (frames=0x7f5ce00148a0) at rpc-clnt.c:405
#4  0x00007f5d051495d5 in rpc_clnt_destroy (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1578
#5  0x00007f5d05149698 in rpc_clnt_unref (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1604
#6  0x00007f5d00caebee in nlm_set_rpc_clnt (rpc_clnt=0x7f5ce0000fd0, caller_name=0x1cf4930 "RHSSA1") at nlm4.c:319
#7  0x00007f5d00cb0e6d in nlm4_establish_callback (csarg=0x7f5cff1085d4) at nlm4.c:945
#8  0x0000003c5be077f1 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c5bae592d in clone () from /lib64/libc.so.6
(gdb) quit



Version-Release number of selected component (if applicable):

3.3.0qa27

How reproducible:
happened once

Steps to Reproduce:
1. create a distribute-replicate volume
2. nfs mount
3. create 100 files
4. start putting locks(shared lock) on them
5. in the mean time start add-brick and rebalance
6. also on try to hold lock for a file with exclusive one lock
  
Actual results:
crash is seen while add-brick/rebalance


Expected results:
1. crash should not have happened
2. the lock in step 6 should not be held till the already held lock on that file is released


Additional info:

nfs.log information

[2012-03-12 09:26:56.641761] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: added root inode
[2012-03-12 09:26:56.642783] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-2: added root inode
[2012-03-12 09:26:56.642861] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-1: added root inode
[2012-03-12 09:31:32.800115] E [rpc-clnt.c:382:saved_frames_unwind] (-->/root/330/inst/lib/libgfrpc.so.0(+0x135b2) [0x7f8e5b4bc5b2] (-->/root/330/inst/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f8e5b4ba05d] (-->/root/330/inst/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f8e5b4b9aa3]))) 0-NLM-client: forced unwinding frame type(NLMv4) op(GRANTED(5)) called at 2012-03-12 09:31:32.799004 (xid=0x4x)
[2012-03-12 09:31:32.800206] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=2236 max=2 total=4
[2012-03-12 09:31:32.800227] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=124 max=2 total=4
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-12 09:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa27
/lib64/libc.so.6[0x3c5ba32900]
/root/330/inst/lib/libgfrpc.so.0(+0xf3c7)[0x7f8e5b4b83c7]
/root/330/inst/lib/libgfrpc.so.0(+0xf41d)[0x7f8e5b4b841d]
/root/330/inst/lib/libgfrpc.so.0(saved_frames_unwind+0x88)[0x7f8e5b4b97ae]
/root/330/inst/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7f8e5b4b9aa3]
/root/330/inst/lib/libgfrpc.so.0(+0x135d5)[0x7f8e5b4bc5d5]
/root/330/inst/lib/libgfrpc.so.0(rpc_clnt_unref+0x6f)[0x7f8e5b4bc698]
/root/330/inst/lib/glusterfs/3.3.0qa27/xlator/nfs/server.so(nlm_set_rpc_clnt+0x221)[0x7f8e57021bee]
/root/330/inst/lib/glusterfs/3.3.0qa27/xlator/nfs/server.so(nlm4_establish_callback+0x5a0)[0x7f8e57023e6d]
/lib64/libpthread.so.0[0x3c5be077f1]
/lib64/libc.so.6(clone+0x6d)[0x3c5bae592d]
---------

Comment 1 Krishna Srinivas 2012-03-14 10:44:06 UTC
rpc_clnt_connection_cleanup() called during unref() does saved_frames_destroy which in turn does a ref() and unref() on the rpc_clnt. Because of this behavior rpc_clnt gets destroyed again causing memory corruption.

I will send a patch which will do rpc_clnt_connection_cleanup() before unref()ing the rpc_clnt which will prevent the mem corruptuon/crash but still a kludgy approach.

Comment 2 Shwetha Panduranga 2012-03-19 12:16:16 UTC
*** Bug 804489 has been marked as a duplicate of this bug. ***

Comment 3 Anand Avati 2012-03-19 16:14:46 UTC
CHANGE: http://review.gluster.com/2979 (rpc-clnt: separate out connection_cleanup() from destroy()) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.