Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 802403

Summary:	nfs-nlm: (crash)add-brick while locks are getting held
Product:	[Community] GlusterFS	Reporter:	Saurabh <saujain>
Component:	nfs	Assignee:	Amar Tumballi <amarts>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	pre-release	CC:	gluster-bugs, mzywusko, shwetha.h.panduranga, vbellur, vraman
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-07-24 18:01:10 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:	3.3.0qa33	Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	817967

Description Saurabh 2012-03-12 13:36:41 UTC

Description of problem:

Core was generated by `/root/330/inst/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glust'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
106		(head->next)->prev = (list->prev);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x00007f5d051453c7 in __list_splice (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:106
#1  0x00007f5d0514541d in list_splice_init (list=0x7f5ce0014908, head=0x7f5ce00148a8) at ../../../libglusterfs/src/list.h:130
#2  0x00007f5d051467ae in saved_frames_unwind (saved_frames=0x7f5ce00148a0) at rpc-clnt.c:360
#3  0x00007f5d05146aa3 in saved_frames_destroy (frames=0x7f5ce00148a0) at rpc-clnt.c:405
#4  0x00007f5d051495d5 in rpc_clnt_destroy (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1578
#5  0x00007f5d05149698 in rpc_clnt_unref (rpc=0x7f5cdc000fd0) at rpc-clnt.c:1604
#6  0x00007f5d00caebee in nlm_set_rpc_clnt (rpc_clnt=0x7f5ce0000fd0, caller_name=0x1cf4930 "RHSSA1") at nlm4.c:319
#7  0x00007f5d00cb0e6d in nlm4_establish_callback (csarg=0x7f5cff1085d4) at nlm4.c:945
#8  0x0000003c5be077f1 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003c5bae592d in clone () from /lib64/libc.so.6
(gdb) quit



Version-Release number of selected component (if applicable):

3.3.0qa27

How reproducible:
happened once

Steps to Reproduce:
1. create a distribute-replicate volume
2. nfs mount
3. create 100 files
4. start putting locks(shared lock) on them
5. in the mean time start add-brick and rebalance
6. also on try to hold lock for a file with exclusive one lock
  
Actual results:
crash is seen while add-brick/rebalance


Expected results:
1. crash should not have happened
2. the lock in step 6 should not be held till the already held lock on that file is released


Additional info:

nfs.log information

[2012-03-12 09:26:56.641761] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: added root inode
[2012-03-12 09:26:56.642783] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-2: added root inode
[2012-03-12 09:26:56.642861] I [afr-common.c:1850:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-1: added root inode
[2012-03-12 09:31:32.800115] E [rpc-clnt.c:382:saved_frames_unwind] (-->/root/330/inst/lib/libgfrpc.so.0(+0x135b2) [0x7f8e5b4bc5b2] (-->/root/330/inst/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f8e5b4ba05d] (-->/root/330/inst/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f8e5b4b9aa3]))) 0-NLM-client: forced unwinding frame type(NLMv4) op(GRANTED(5)) called at 2012-03-12 09:31:32.799004 (xid=0x4x)
[2012-03-12 09:31:32.800206] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=2236 max=2 total=4
[2012-03-12 09:31:32.800227] I [mem-pool.c:578:mem_pool_destroy] 0-nfs-server: size=124 max=2 total=4
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-03-12 09:31:32
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa27
/lib64/libc.so.6[0x3c5ba32900]
/root/330/inst/lib/libgfrpc.so.0(+0xf3c7)[0x7f8e5b4b83c7]
/root/330/inst/lib/libgfrpc.so.0(+0xf41d)[0x7f8e5b4b841d]
/root/330/inst/lib/libgfrpc.so.0(saved_frames_unwind+0x88)[0x7f8e5b4b97ae]
/root/330/inst/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7f8e5b4b9aa3]
/root/330/inst/lib/libgfrpc.so.0(+0x135d5)[0x7f8e5b4bc5d5]
/root/330/inst/lib/libgfrpc.so.0(rpc_clnt_unref+0x6f)[0x7f8e5b4bc698]
/root/330/inst/lib/glusterfs/3.3.0qa27/xlator/nfs/server.so(nlm_set_rpc_clnt+0x221)[0x7f8e57021bee]
/root/330/inst/lib/glusterfs/3.3.0qa27/xlator/nfs/server.so(nlm4_establish_callback+0x5a0)[0x7f8e57023e6d]
/lib64/libpthread.so.0[0x3c5be077f1]
/lib64/libc.so.6(clone+0x6d)[0x3c5bae592d]
---------

Comment 1 Krishna Srinivas 2012-03-14 10:44:06 UTC

rpc_clnt_connection_cleanup() called during unref() does saved_frames_destroy which in turn does a ref() and unref() on the rpc_clnt. Because of this behavior rpc_clnt gets destroyed again causing memory corruption.

I will send a patch which will do rpc_clnt_connection_cleanup() before unref()ing the rpc_clnt which will prevent the mem corruptuon/crash but still a kludgy approach.

Comment 2 Shwetha Panduranga 2012-03-19 12:16:16 UTC

*** Bug 804489 has been marked as a duplicate of this bug. ***

Comment 3 Anand Avati 2012-03-19 16:14:46 UTC

CHANGE: http://review.gluster.com/2979 (rpc-clnt: separate out connection_cleanup() from destroy()) merged in master by Vijay Bellur (vijay)