| Summary: | Crash in __inode_retire on NFS failover | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Vikas Gorur <vikas> |
| Component: | core | Assignee: | Anand Avati <aavati> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.1.1 | CC: | chrisw, gluster-bugs, rahulcs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Vikas, Please try if http://patches.gluster.com/patch/5826/ fixes your crash. This patch has been verified to work locally here. Avati PATCH: http://patches.gluster.com/patch/5826 in master (nfs: cleanup inode_ref/inode_unref to fix inode leaks and extra unrefs) |
While testing NFS failover with ucarp in a distribute-replicate setup, the following crashes were observed. They are all in __inode_retire, but the backtraces are different. Operations being done on the mountpoint were cp -r and rm -rf while one of the servers went down and came back up. Ucarp was setup to migrate the IP that was used by the client to mount the volume (through NFS). #0 __inode_retire (inode=0xb397dd8) at inode.c:359 359 inode->table->purge_size++; (gdb) bt #0 __inode_retire (inode=0xb397dd8) at inode.c:359 #1 0x0000003748028140 in inode_table_prune (table=0xb397dd0) at inode.c:1064 #2 0x000000374802841c in inode_unref (inode=0x2aaaabe9cc28) at inode.c:427 #3 0x00002aaaabbaefed in nfs_entry_loc_fill (itable=0xb397dd0, pargfid=<value optimized out>, entry=0xb4509e0 "socketvar.h", loc=0x2aaaad5ea5dc, how=2) at nfs-common.c:386 #4 0x00002aaaabbcc07f in nfs3_fh_resolve_entry_hard (cs=0x2aaaad5ea1d4) at nfs3-helpers.c:3023 #5 0x00002aaaabbcc273 in nfs3_fh_resolve_entry (cs=0x2aaaad5ea1d4) at nfs3-helpers.c:3080 #6 0x00002aaaabbc97eb in nfs3_lookup (req=0x2aaaafd8e014, fh=0x4045ffd0, fhlen=<value optimized out>, name=0x4045efd0 "socketvar.h") at nfs3.c:1259 #7 0x00002aaaabbc9a5f in nfs3svc_lookup (req=0x2aaaafd8e014) at nfs3.c:1303 #8 0x00002aaaabbd97f4 in nfs_rpcsvc_handle_rpc_call (conn=0xb3c5410) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1984 #9 0x00002aaaabbd9965 in nfs_rpcsvc_record_update_state (conn=0xb3c5410, dataread=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2469 #10 0x00002aaaabbd9e68 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=11088, data=0xb3c5410, poll_in=1, poll_out=188317144, poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2641 #11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0xb397b60) at event.c:812 #12 event_dispatch_epoll (event_pool=0xb397b60) at event.c:876 #13 0x00002aaaabbd7332 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64 #14 0x00000037c440673d in start_thread () from /lib64/libpthread.so.0 #15 0x00000037c38d3f6d in clone () from /lib64/libc.so.6 #0 __inode_retire (inode=0x19035dd8) at inode.c:359 359 inode->table->purge_size++; (gdb) bt #0 __inode_retire (inode=0x19035dd8) at inode.c:359 #1 0x0000003748028140 in inode_table_prune (table=0x19035dd0) at inode.c:1064 #2 0x000000374802841c in inode_unref (inode=0x2aaaabe73bdc) at inode.c:427 #3 0x0000003748017042 in loc_wipe (loc=0x190f8e50) at xlator.c:1152 #4 0x00002aaaaaabb30e in client_local_wipe (local=0x190f8e50) at client-helpers.c:110 #5 0x00002aaaaaacaf80 in client3_1_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x2b11e608d4e8) at client3_1-fops.c:1939 #6 0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x1905aba0, pollin=<value optimized out>) at rpc-clnt.c:752 #7 0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1905abd0, event=<value optimized out>, data=0x19035e48) at rpc-clnt.c:865 #8 0x000000374880a95c in rpc_transport_notify (this=0x19035dd8, event=11237, data=0x19035e48) at rpc-transport.c:1148 #9 0x00002aaaad96ef9f in socket_event_poll_in (this=0x1905ad60) at socket.c:1618 #10 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x1905ad60, poll_in=1, poll_out=0, poll_err=0) at socket.c:1732 #11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x1901a360) at event.c:812 #12 event_dispatch_epoll (event_pool=0x1901a360) at event.c:876 #13 0x000000000040470b in main (argc=7, argv=0x7fff05290358) at glusterfsd.c:1414 #0 __inode_retire (inode=0x199eadc8) at inode.c:359 359 inode->table->purge_size++; (gdb) bt #0 __inode_retire (inode=0x199eadc8) at inode.c:359 #1 0x0000003748028140 in inode_table_prune (table=0x199eadc0) at inode.c:1064 #2 0x000000374802841c in inode_unref (inode=0x2aaaabe85e88) at inode.c:427 #3 0x0000003748017042 in loc_wipe (loc=0x2aaab0007bd0) at xlator.c:1152 #4 0x00002aaaaaabb30e in client_local_wipe (local=0x2aaab0007bd0) at client-helpers.c:110 #5 0x00002aaaaaacaf80 in client3_1_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x2b28373006ec) at client3_1-fops.c:1939 #6 0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x19a13590, pollin=<value optimized out>) at rpc-clnt.c:752 #7 0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x19a135c0, event=<value optimized out>, data=0x199eae38) at rpc-clnt.c:865 #8 0x000000374880a95c in rpc_transport_notify (this=0x199eadc8, event=5343, data=0x199eae38) at rpc-transport.c:1148 #9 0x00002aaaad96ef9f in socket_event_poll_in (this=0x19a13750) at socket.c:1618 #10 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=8, data=0x19a13750, poll_in=1, poll_out=0, poll_err=0) at socket.c:1732 #11 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x199cf350) at event.c:812 #12 event_dispatch_epoll (event_pool=0x199cf350) at event.c:876 #13 0x000000000040470b in main (argc=7, argv=0x7fff438a7008) at glusterfsd.c:1414 #0 __inode_retire (inode=0x1e83dc8) at inode.c:359 359 inode->table->purge_size++; (gdb) bt #0 __inode_retire (inode=0x1e83dc8) at inode.c:359 #1 0x0000003748028140 in inode_table_prune (table=0x1e83dc0) at inode.c:1064 #2 0x000000374802841c in inode_unref (inode=0x2aaaabe70948) at inode.c:427 #3 0x00002aaaaad16823 in afr_local_cleanup (local=0x2aaab0000f90, this=0x1e79720) at afr-common.c:391 #4 0x00002aaaaad00a02 in afr_self_heal_completion_cbk (bgsh_frame=<value optimized out>, this=<value optimized out>) at afr-self-heal-common.c:1540 #5 0x00002aaaaacfe960 in afr_sh_data_done (frame=0x2b1eff670fc4, this=0x1e79720) at afr-self-heal-data.c:82 #6 0x00002aaaaacffec2 in afr_sh_data_flush_cbk (frame=0x2b1eff670fc4, cookie=0x1, this=0x1e79720, op_ret=<value optimized out>, op_errno=2) at afr-self-heal-data.c:115 #7 0x00002aaaaacfff69 in afr_sh_data_setattr_cbk (frame=0x1e83dc8, cookie=0x1000015c2, this=0x1e83e38, op_ret=31997496, op_errno=31997384, statpre=0x15c2, statpost=0x7fffe994b0f0) at afr-self-heal-data.c:126 #8 0x00002aaaaaacd6ff in client3_1_setattr_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x2b1eff4634b8) at client3_1-fops.c:1414 #9 0x000000374880f542 in rpc_clnt_handle_reply (clnt=0x1eaafd0, pollin=<value optimized out>) at rpc-clnt.c:752 #10 0x000000374880f73d in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1eab000, event=<value optimized out>, data=0x1e83e38) at rpc-clnt.c:865 #11 0x000000374880a95c in rpc_transport_notify (this=0x1e83dc8, event=5570, data=0x1e83e38) at rpc-transport.c:1148 #12 0x00002aaaad96ef9f in socket_event_poll_in (this=0x1eab190) at socket.c:1618 #13 0x00002aaaad96f130 in socket_event_handler (fd=<value optimized out>, idx=0, data=0x1eab190, poll_in=1, poll_out=0, poll_err=0) at socket.c:1732 #14 0x0000003748037917 in event_dispatch_epoll_handler (event_pool=0x1e68350) at event.c:812 #15 event_dispatch_epoll (event_pool=0x1e68350) at event.c:876 #16 0x000000000040470b in main (argc=7, argv=0x7fffe994ba58) at glusterfsd.c:1414 (gdb) fr 0 #0 __inode_retire (inode=0x1e83dc8) at inode.c:359 359 inode->table->purge_size++; (gdb) p inode $1 = (inode_t *) 0x1e83dc8 (gdb) p *inode $2 = {table = 0x1000015c2, gfid = '\000' <repeats 15 times>, lock = 0, nlookup = 14057, ref = 31997824, ino = 46912516849700, ia_type = 31987600, fd_list = {next = 0x15f90, prev = 0x2aaaad25a010}, dentry_list = { next = 0x2aaaad35b010, prev = 0x2aaaabe709b8}, hash = {next = 0x2aaaabe70094, prev = 0x6}, list = {next = 0x1e83e38, prev = 0x1e83e38}, _ctx = 0xfffffffe} (gdb) p inode->table $3 = (inode_table_t *) 0x1000015c2 (gdb) p *inode->table Cannot access memory at address 0x1000015c2