Bug 802403 - nfs-nlm: (crash)add-brick while locks are getting held
nfs-nlm: (crash)add-brick while locks are getting held
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Amar Tumballi
: 804489 (view as bug list)
Depends On:
Blocks: 817967
  Show dependency treegraph
Reported: 2012-03-12 09:36 EDT by Saurabh
Modified: 2016-01-19 01:09 EST (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-07-24 14:01:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions: 3.3.0qa33
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Saurabh 2012-03-12 09:36:41 EDT
Description of problem:

Core was generated by `/root/330/inst/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glust'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
106		(head->next)->prev = (list->prev);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
#1  0x00007f5d0514541d in list_splice_init (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:130
#2  0x00007f5d051467ae in saved_frames_unwind (saved_frames=0x7f5ce00148a0) at rpc-clnt.c:360
#3  0x00007f5d05146aa3 in saved_frames_destroy (frames=0x7f5ce00148a0) at rpc-clnt.c:405
#4  0x00007f5d051495d5 in rpc_clnt_destroy (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1578
#5  0x00007f5d05149698 in rpc_clnt_unref (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1604
#6  0x00007f5d00caebee in nlm_set_rpc_clnt (rpc_clnt=0x7f5ce0000fd0, caller_name=0x1cf4930 "RHSSA1") at nlm4.c:319
#7  0x00007f5d00cb0e6d in nlm4_establish_callback (csarg=0x7f5cff1085d4) at nlm4.c:945
#8  0x0000003c5be077f1 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c5bae592d in clone () from /lib64/libc.so.6
(gdb) quit

Version-Release number of selected component (if applicable):


How reproducible:
happened once

Steps to Reproduce:
1. create a distribute-replicate volume
2. nfs mount
3. create 100 files
4. start putting locks(shared lock) on them
5. in the mean time start add-brick and rebalance
6. also on try to hold lock for a file with exclusive one lock
Actual results:
crash is seen while add-brick/rebalance

Expected results:
1. crash should not have happened
2. the lock in step 6 should not be held till the already held lock on that file is released

Additional info:

nfs.log information

[2012-03-12 09:26:56.641761] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: added root inode
[2012-03-12 09:26:56.642783] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-2: added root inode
[2012-03-12 09:26:56.642861] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-1: added root inode
[2012-03-12 09:31:32.800115] E [rpc-clnt.c:382:saved_frames_unwind] (-->/root/330/inst/lib/libgfrpc.so.0(+0x135b2) [0x7f8e5b4bc5b2] (-->/root/330/inst/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f8e5b4ba05d] (-->/root/330/inst/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f8e5b4b9aa3]))) 0-NLM-client: forced unwinding frame type(NLMv4) op(GRANTED(5)) called at 2012-03-12 09:31:32.799004 (xid=0x4x)
[2012-03-12 09:31:32.800206] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=2236 max=2 total=4
[2012-03-12 09:31:32.800227] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=124 max=2 total=4
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-12 09:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa27
Comment 1 Krishna Srinivas 2012-03-14 06:44:06 EDT
rpc_clnt_connection_cleanup() called during unref() does saved_frames_destroy which in turn does a ref() and unref() on the rpc_clnt. Because of this behavior rpc_clnt gets destroyed again causing memory corruption.

I will send a patch which will do rpc_clnt_connection_cleanup() before unref()ing the rpc_clnt which will prevent the mem corruptuon/crash but still a kludgy approach.
Comment 2 Shwetha Panduranga 2012-03-19 08:16:16 EDT
*** Bug 804489 has been marked as a duplicate of this bug. ***
Comment 3 Anand Avati 2012-03-19 12:14:46 EDT
CHANGE: http://review.gluster.com/2979 (rpc-clnt: separate out connection_cleanup() from destroy()) merged in master by Vijay Bellur (vijay@gluster.com)

Note You need to log in before you can comment on or make changes to this bug.