Bug 1298524

Summary: glusterd service crashed on restarting rpcbind
Product: Red Hat Gluster Storage Reporter: Apeksha <akhakhar>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED WONTFIX QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: akhakhar, rhs-bugs, sasundar, sbhaloth, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-19 07:20:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
glusterd log file none

Description Apeksha 2016-01-14 10:38:32 UTC
Created attachment 1114749 [details]
glusterd log file

Description of problem:
glusterd service crashed on restarting rpcbind

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-16.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1. Update glusterfs from 3.75.15 tp 3.7.5.16 build
2. start glusterd and start the volume
3. setup ganesha on 4 nodes
4. Add the nlm port in ganesha.conf file
5. restart the rpcbind service on all 4nodes, glusterd stops on all 4 nodes

/var/log/glusterfs/etc-glusterfs-glusterd.vol.log
5/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162) [0x7fa47375dbd2] -->/usr/lib64/glusterfs/3.7.5/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a) [0x7fa4737fcd2a] ) 0-management: Lock for vol testvol not held
[2016-01-14 17:57:19.545973] W [MSGID: 106118] [glusterd-handler.c:5088:__glusterd_peer_rpc_notify] 0-management: Lock not released for testvol
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-01-14 17:57:19
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1

Actual results: glusterd crashed


Expected results: 


Additional info:

Comment 4 Atin Mukherjee 2016-01-14 12:02:15 UTC
I got to know that it was a layered installation and vdsm package was not installed due to which core_pattern was not set and hence we couldn't find the core file. Until and unless we have the core, we can't analyse the reason of the crash. Please try to reproduce this bug with vdsm package and let us know the behaviour. From the look of it, I feel you may not be able to hit it always and hence I'd suggest you to run the steps multiple times.

Comment 5 surabhi 2016-01-19 06:50:42 UTC
While running automation tests on cifs mount for multiple times following crash is seen and here is the bt :


(gdb) bt
#0  0x00007f55616460ad in rcu_read_lock_bp () from /lib64/liburcu-bp.so.1
#1  0x00007f5561c15ac0 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x7f556eeb1530, mydata=mydata@entry=0x7f556ee83850, event=event@entry=RPC_CLNT_DISCONNECT, 
    data=data@entry=0x0) at glusterd-handler.c:5020
#2  0x00007f5561c0bb6c in glusterd_big_locked_notify (rpc=0x7f556eeb1530, mydata=0x7f556ee83850, event=RPC_CLNT_DISCONNECT, data=0x0, 
    notify_fn=0x7f5561c15a70 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:71
#3  0x00007f556ce7fcf0 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f556eeb1560, event=RPC_TRANSPORT_DISCONNECT, data=0x7f556eeb5630) at rpc-clnt.c:874
#4  0x00007f556ce7b913 in rpc_transport_notify (this=this@entry=0x7f556eeb5630, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f556eeb5630)
    at rpc-transport.c:545
#5  0x00007f555f0a5352 in socket_event_poll_err (this=0x7f556eeb5630) at socket.c:1151
#6  socket_event_handler (fd=fd@entry=13, idx=idx@entry=2, data=0x7f556eeb5630, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2356
#7  0x00007f556d1128ca in event_dispatch_epoll_handler (event=0x7f555d0d2e80, event_pool=0x7f556ee31c90) at event-epoll.c:575
#8  event_dispatch_epoll_worker (data=0x7f556ee44fb0) at event-epoll.c:678
#9  0x00007f556bf19dc5 in start_thread (arg=0x7f555d0d3700) at pthread_create.c:308
#10 0x00007f556b8601cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

I am collecting the sosreports , will be uploading it soon with the core dump as well.

Comment 7 SATHEESARAN 2016-05-11 13:44:34 UTC
Removing the needinfo, as the bug was already closed