Bug 763927 (GLUSTER-2195)

Summary: Crash in __inode_retire on NFS failover
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: coreAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.1CC: chrisw, gluster-bugs, rahulcs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vikas Gorur 2010-12-06 20:11:15 UTC
While testing NFS failover with ucarp in a distribute-replicate setup, the following crashes were observed. They are all in __inode_retire, but the backtraces are different. Operations being done on the mountpoint were cp -r and rm -rf while one of the servers went down and came back up. Ucarp was setup to migrate the IP that was used by the client to mount the volume (through NFS).

#0  __inode_retire (inode=0xb397dd8) at inode.c:359
359             inode->table->purge_size++;
(gdb) bt
#0  __inode_retire (inode=0xb397dd8) at inode.c:359
#1  0x0000003748028140 in inode_table_prune (table=0xb397dd0) at inode.c:1064
#2  0x000000374802841c in inode_unref (inode=0x2aaaabe9cc28) at inode.c:427
#3  0x00002aaaabbaefed in nfs_entry_loc_fill (itable=0xb397dd0, pargfid=<value optimized out>,
    entry=0xb4509e0 "socketvar.h", loc=0x2aaaad5ea5dc, how=2) at nfs-common.c:386
#4  0x00002aaaabbcc07f in nfs3_fh_resolve_entry_hard (cs=0x2aaaad5ea1d4) at nfs3-helpers.c:3023
#5  0x00002aaaabbcc273 in nfs3_fh_resolve_entry (cs=0x2aaaad5ea1d4) at nfs3-helpers.c:3080
#6  0x00002aaaabbc97eb in nfs3_lookup (req=0x2aaaafd8e014, fh=0x4045ffd0, fhlen=<value optimized out>,
    name=0x4045efd0 "socketvar.h") at nfs3.c:1259
#7  0x00002aaaabbc9a5f in nfs3svc_lookup (req=0x2aaaafd8e014) at nfs3.c:1303
#8  0x00002aaaabbd97f4 in nfs_rpcsvc_handle_rpc_call (conn=0xb3c5410) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1984
#9  0x00002aaaabbd9965 in nfs_rpcsvc_record_update_state (conn=0xb3c5410, dataread=0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:2469
#10 0x00002aaaabbd9e68 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=11088, data=0xb3c5410, poll_in=1,
    poll_out=188317144, poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2641
#11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0xb397b60) at event.c:812
#12 event_dispatch_epoll (event_pool=0xb397b60) at event.c:876
#13 0x00002aaaabbd7332 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64
#14 0x00000037c440673d in start_thread () from /lib64/libpthread.so.0
#15 0x00000037c38d3f6d in clone () from /lib64/libc.so.6



#0  __inode_retire (inode=0x19035dd8) at inode.c:359
359             inode->table->purge_size++;
(gdb) bt
#0  __inode_retire (inode=0x19035dd8) at inode.c:359
#1  0x0000003748028140 in inode_table_prune (table=0x19035dd0) at inode.c:1064
#2  0x000000374802841c in inode_unref (inode=0x2aaaabe73bdc) at inode.c:427
#3  0x0000003748017042 in loc_wipe (loc=0x190f8e50) at xlator.c:1152
#4  0x00002aaaaaabb30e in client_local_wipe (local=0x190f8e50) at client-helpers.c:110
#5  0x00002aaaaaacaf80 in client3_1_lookup_cbk (req=<value optimized out>, iov=<value optimized out>,
    count=<value optimized out>, myframe=0x2b11e608d4e8) at client3_1-fops.c:1939
#6  0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x1905aba0, pollin=<value optimized out>) at rpc-clnt.c:752
#7  0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1905abd0, event=<value optimized out>,
    data=0x19035e48) at rpc-clnt.c:865
#8  0x000000374880a95c in rpc_transport_notify (this=0x19035dd8, event=11237, data=0x19035e48) at rpc-transport.c:1148
#9  0x00002aaaad96ef9f in socket_event_poll_in (this=0x1905ad60) at socket.c:1618
#10 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x1905ad60, poll_in=1, poll_out=0,
    poll_err=0) at socket.c:1732
#11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x1901a360) at event.c:812
#12 event_dispatch_epoll (event_pool=0x1901a360) at event.c:876
#13 0x000000000040470b in main (argc=7, argv=0x7fff05290358) at glusterfsd.c:1414


#0  __inode_retire (inode=0x199eadc8) at inode.c:359
359             inode->table->purge_size++;
(gdb) bt
#0  __inode_retire (inode=0x199eadc8) at inode.c:359
#1  0x0000003748028140 in inode_table_prune (table=0x199eadc0) at inode.c:1064
#2  0x000000374802841c in inode_unref (inode=0x2aaaabe85e88) at inode.c:427
#3  0x0000003748017042 in loc_wipe (loc=0x2aaab0007bd0) at xlator.c:1152
#4  0x00002aaaaaabb30e in client_local_wipe (local=0x2aaab0007bd0) at client-helpers.c:110
#5  0x00002aaaaaacaf80 in client3_1_lookup_cbk (req=<value optimized out>, iov=<value optimized out>,
    count=<value optimized out>, myframe=0x2b28373006ec) at client3_1-fops.c:1939
#6  0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x19a13590, pollin=<value optimized out>) at rpc-clnt.c:752
#7  0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x19a135c0, event=<value optimized out>,
    data=0x199eae38) at rpc-clnt.c:865
#8  0x000000374880a95c in rpc_transport_notify (this=0x199eadc8, event=5343, data=0x199eae38) at rpc-transport.c:1148
#9  0x00002aaaad96ef9f in socket_event_poll_in (this=0x19a13750) at socket.c:1618
#10 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=8, data=0x19a13750, poll_in=1, poll_out=0,
    poll_err=0) at socket.c:1732
#11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x199cf350) at event.c:812
#12 event_dispatch_epoll (event_pool=0x199cf350) at event.c:876
#13 0x000000000040470b in main (argc=7, argv=0x7fff438a7008) at glusterfsd.c:1414


#0  __inode_retire (inode=0x1e83dc8) at inode.c:359
359             inode->table->purge_size++;
(gdb) bt
#0  __inode_retire (inode=0x1e83dc8) at inode.c:359
#1  0x0000003748028140 in inode_table_prune (table=0x1e83dc0) at inode.c:1064
#2  0x000000374802841c in inode_unref (inode=0x2aaaabe70948) at inode.c:427
#3  0x00002aaaaad16823 in afr_local_cleanup (local=0x2aaab0000f90, this=0x1e79720) at afr-common.c:391
#4  0x00002aaaaad00a02 in afr_self_heal_completion_cbk (bgsh_frame=<value optimized out>, this=<value optimized out>)
    at afr-self-heal-common.c:1540
#5  0x00002aaaaacfe960 in afr_sh_data_done (frame=0x2b1eff670fc4, this=0x1e79720) at afr-self-heal-data.c:82
#6  0x00002aaaaacffec2 in afr_sh_data_flush_cbk (frame=0x2b1eff670fc4, cookie=0x1, this=0x1e79720,
    op_ret=<value optimized out>, op_errno=2) at afr-self-heal-data.c:115
#7  0x00002aaaaacfff69 in afr_sh_data_setattr_cbk (frame=0x1e83dc8, cookie=0x1000015c2, this=0x1e83e38, op_ret=31997496,
    op_errno=31997384, statpre=0x15c2, statpost=0x7fffe994b0f0) at afr-self-heal-data.c:126
#8  0x00002aaaaaacd6ff in client3_1_setattr_cbk (req=<value optimized out>, iov=<value optimized out>,
    count=<value optimized out>, myframe=0x2b1eff4634b8) at client3_1-fops.c:1414
#9  0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x1eaafd0, pollin=<value optimized out>) at rpc-clnt.c:752
#10 0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1eab000, event=<value optimized out>,
    data=0x1e83e38) at rpc-clnt.c:865
#11 0x000000374880a95c in rpc_transport_notify (this=0x1e83dc8, event=5570, data=0x1e83e38) at rpc-transport.c:1148
#12 0x00002aaaad96ef9f in socket_event_poll_in (this=0x1eab190) at socket.c:1618
#13 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=0, data=0x1eab190, poll_in=1, poll_out=0,
    poll_err=0) at socket.c:1732
#14 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x1e68350) at event.c:812
#15 event_dispatch_epoll (event_pool=0x1e68350) at event.c:876
#16 0x000000000040470b in main (argc=7, argv=0x7fffe994ba58) at glusterfsd.c:1414


(gdb) fr 0
#0  __inode_retire (inode=0x1e83dc8) at inode.c:359
359             inode->table->purge_size++;
(gdb) p inode
$1 = (inode_t *) 0x1e83dc8
(gdb) p *inode
$2 = {table = 0x1000015c2, gfid = '\000' <repeats 15 times>, lock = 0, nlookup = 14057, ref = 31997824,
  ino = 46912516849700, ia_type = 31987600, fd_list = {next = 0x15f90, prev = 0x2aaaad25a010}, dentry_list = {
    next = 0x2aaaad35b010, prev = 0x2aaaabe709b8}, hash = {next = 0x2aaaabe70094, prev = 0x6}, list = {next = 0x1e83e38,
    prev = 0x1e83e38}, _ctx = 0xfffffffe}
(gdb) p inode->table
$3 = (inode_table_t *) 0x1000015c2
(gdb) p *inode->table
Cannot access memory at address 0x1000015c2

Comment 1 Anand Avati 2010-12-07 09:38:28 UTC
Vikas,
  Please try if  http://patches.gluster.com/patch/5826/ fixes your crash. This patch has been verified to work locally here.

Avati

Comment 2 Anand Avati 2010-12-07 13:14:41 UTC
PATCH: http://patches.gluster.com/patch/5826 in master (nfs: cleanup inode_ref/inode_unref to fix inode leaks and extra unrefs)