Hide Forgot
glustershd crashed because of a race present in stack_destroy and unlock in which case by the time the reply has come for the unlock op, the frame would have been destroyed. This is the backtrace. Core was generated by `/usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /etc/'. Program terminated with signal 11, Segmentation fault. #0 0x00002aaaacf1ff53 in afr_unlock_inodelk_cbk (frame=0x2aadaa8ad854, cookie=0x1, this=0xa608a50, op_ret=0, op_errno=0) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:571 571 int_lock->inode_locked_nodes[child_index] &= LOCKED_NO; (gdb) bt #0 0x00002aaaacf1ff53 in afr_unlock_inodelk_cbk (frame=0x2aadaa8ad854, cookie=0x1, this=0xa608a50, op_ret=0, op_errno=0) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:571 #1 0x00002aaaaccb54e0 in client3_1_inodelk_cbk (req=0x2aaaad36b520, iov=0x2aaaad36b560, count=1, myframe=0x2aadaa616070) at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1237 #2 0x00002aada996825e in rpc_clnt_handle_reply (clnt=0xa60ec80, pollin=0x2aaab804f6e0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789 #3 0x00002aada9968586 in rpc_clnt_notify (trans=0xa60efb0, mydata=0xa60ecb0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2aaab804f6e0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:902 #4 0x00002aada99649f3 in rpc_transport_notify (this=0xa60efb0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2aaab804f6e0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498 #5 0x00002aaaaab5cee3 in socket_event_poll_in (this=0xa60efb0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675 #6 0x00002aaaaab5d425 in socket_event_handler (fd=31, idx=23, data=0xa60efb0, poll_in=1, poll_out=0, poll_err=0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790 #7 0x00002aada970fb40 in event_dispatch_epoll_handler (event_pool=0xa5fb960, events=0xa6003f0, i=0) at ../../../libglusterfs/src/event.c:794 #8 0x00002aada970fd45 in event_dispatch_epoll (event_pool=0xa5fb960) at ../../../libglusterfs/src/event.c:856 #9 0x00002aada971009f in event_dispatch (event_pool=0xa5fb960) at ../../../libglusterfs/src/event.c:956 #10 0x000000000040784d in main (argc=11, argv=0x7fffe64c4ef8) at ../../../glusterfsd/src/glusterfsd.c:1592 (gdb) info thr 4 Thread 25842 0x00000030b960e838 in do_sigwait () from /lib64/libpthread.so.0 3 Thread 25843 0x00000030b960aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 25844 0x00000030b8e9a541 in nanosleep () from /lib64/libc.so.6 * 1 Thread 25841 0x00002aaaacf1ff53 in afr_unlock_inodelk_cbk (frame=0x2aadaa8ad854, cookie=0x1, this=0xa608a50, op_ret=0, op_errno=0) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:571 (gdb) f 0 #0 0x00002aaaacf1ff53 in afr_unlock_inodelk_cbk (frame=0x2aadaa8ad854, cookie=0x1, this=0xa608a50, op_ret=0, op_errno=0) at ../../../../../xlators/cluster/afr/src/afr-lk-common.c:571 571 int_lock->inode_locked_nodes[child_index] &= LOCKED_NO; (gdb) p int_lock $1 = (afr_internal_lock_t *) 0x108 (gdb) p local $2 = (afr_local_t *) 0x0 (gdb) These are the logs: [2011-10-01 07:05:16.831656] I [afr-common.c:1083:afr_detect_self_heal_by_iatt] 0-mirror-replicate-0: size differs for /run24646/fileop_L1_2/f ileop_L1_2_L2_5/fileop_file_2_5_11 [2011-10-01 07:05:16.831684] I [afr-common.c:1234:afr_launch_self_heal] 0-mirror-replicate-0: background meta-data data self-heal triggered. path: /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_11, reason: lookup detected pending operations [2011-10-01 07:05:16.836310] I [afr-self-heal-algorithm.c:126:sh_loop_driver_done] 0-mirror-replicate-0: full self-heal completed on /run24646 /fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_8 [2011-10-01 07:05:17.331482] I [afr-self-heal-common.c:2055:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background meta-data data sel f-heal completed on /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_8 [2011-10-01 07:05:17.717995] W [socket.c:1510:__socket_proto_state_machine] 0-mirror-client-0: reading from socket failed. Error (Transport en dpoint is not connected), peer (10.1.11.73:24009) [2011-10-01 07:05:17.718481] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x158) [0x2aada9968455] ( -->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x101) [0x2aada9967a03] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1c ) [0x2aada996752f]))) 0-mirror-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) called at 2011-10-01 07:05:17.332284 [2011-10-01 07:05:17.718515] W [client3_1-fops.c:1235:client3_1_inodelk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected [2011-10-01 07:05:17.718603] I [client.c:1885:client_rpc_notify] 0-mirror-client-0: disconnected [2011-10-01 07:05:17.724161] W [client3_1-fops.c:4649:client3_1_inodelk] 0-mirror-client-0: failed to send the fop: Transport endpoint is not connected [2011-10-01 07:05:17.744728] W [client3_1-fops.c:4369:client3_1_fxattrop] 0-mirror-client-0: (00000000-0000-0000-0000-000000000000): failed to get fd ctx. EBADFD [2011-10-01 07:05:17.744751] W [client3_1-fops.c:4435:client3_1_fxattrop] 0-mirror-client-0: failed to send the fop: File descriptor in bad st ate [2011-10-01 07:05:17.752388] W [client3_1-fops.c:3702:client3_1_fstat] 0-mirror-client-0: (00000000-0000-0000-0000-000000000000): failed to ge t fd ctx. EBADFD [2011-10-01 07:05:17.752410] W [client3_1-fops.c:3716:client3_1_fstat] 0-mirror-client-0: failed to send the fop: File descriptor in bad state [2011-10-01 07:05:17.754059] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 [2011-10-01 07:05:17.754098] W [client3_1-fops.c:2761:client3_1_ftruncate] 0-mirror-client-0: (00000000-0000-0000-0000-000000000000): failed t o get fd ctx. EBADFD [2011-10-01 07:05:17.754115] W [client3_1-fops.c:2777:client3_1_ftruncate] 0-mirror-client-0: failed to send the fop: File descriptor in bad s tate [2011-10-01 07:05:17.754130] I [afr-self-heal-data.c:571:afr_sh_data_trim_cbk] 0-mirror-replicate-0: ftruncate of /run24646/fileop_L1_2/fileop _L1_2_L2_5/fileop_file_2_5_11 on subvolume mirror-client-0 failed (File descriptor in bad state) [2011-10-01 07:05:17.754159] W [client3_1-fops.c:4649:client3_1_inodelk] 0-mirror-client-0: failed to send the fop: Transport endpoint is not connected [2011-10-01 07:05:17.754210] I [syncop.c:243:syncenv_processor] 0-: SYNCOP: task: 0xa617a40, env: 0xa5fc530, task->env: 0xa5fc530 [2011-10-01 07:05:17.782468] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 [2011-10-01 07:05:17.782505] I [syncop.c:243:syncenv_processor] 0-: SYNCOP: task: 0xa617a40, env: 0xa5fc530, task->env: 0xa5fc530 [2011-10-01 07:05:17.782928] W [client3_1-fops.c:4649:client3_1_inodelk] 0-mirror-client-0: failed to send the fop: Transport endpoint is not connected [2011-10-01 07:05:18.304095] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 [2011-10-01 07:05:18.304137] I [syncop.c:243:syncenv_processor] 0-: SYNCOP: task: 0xa617a40, env: 0xa5fc530, task->env: 0xa5fc530 [2011-10-01 07:05:18.305680] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 [2011-10-01 07:05:18.305717] I [syncop.c:243:syncenv_processor] 0-: SYNCOP: task: 0xa617a40, env: 0xa5fc530, task->env: 0xa5fc530 [2011-10-01 07:05:18.306006] W [client3_1-fops.c:3572:client3_1_writev] 0-mirror-client-0: (00000000-0000-0000-0000-000000000000): failed to get fd ctx. EBADFD [2011-10-01 07:05:18.306030] W [client3_1-fops.c:3588:client3_1_writev] 0-mirror-client-0: failed to send the fop: File descriptor in bad state [2011-10-01 07:05:18.306045] E [afr-self-heal-algorithm.c:428:sh_loop_write_cbk] 0-mirror-replicate-0: write to /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_11 failed on subvolume mirror-client-0 (File descriptor in bad state) [2011-10-01 07:05:18.306085] I [afr-self-heal-algorithm.c:119:sh_loop_driver_done] 0-mirror-replicate-0: self-heal aborting on /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_11 [2011-10-01 07:05:18.306122] W [client3_1-fops.c:3612:client3_1_flush] 0-mirror-client-0: (00000000-0000-0000-0000-000000000000): failed to get fd ctx. EBADFD [2011-10-01 07:05:18.306140] W [client3_1-fops.c:3639:client3_1_flush] 0-mirror-client-0: failed to send the fop: File descriptor in bad state [2011-10-01 07:05:18.306154] I [afr-self-heal-data.c:107:afr_sh_data_flush_cbk] 0-mirror-replicate-0: flush failed on /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_11 on subvolume mirror-client-0: File descriptor in bad state [2011-10-01 07:05:18.306243] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 [2011-10-01 07:05:18.306279] I [syncop.c:243:syncenv_processor] 0-: SYNCOP: task: 0xa617a40, env: 0xa5fc530, task->env: 0xa5fc530 [2011-10-01 07:05:18.307465] E [afr-self-heal-common.c:2052:afr_self_heal_completion_cbk] 0-mirror-replicate-0: background meta-data data self-heal failed on /run24646/fileop_L1_2/fileop_L1_2_L2_5/fileop_file_2_5_11 [2011-10-01 07:05:18.307668] I [syncop.c:89:synctask_wake] 0-: SYNCOP: task: 0xa617a40, task->env: 0xa5fc530 pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2011-10-01 07:05:18 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0qa13 /lib64/libc.so.6[0x30b8e302d0] /usr/local/lib/glusterfs/3.3.0qa13/xlator/cluster/replicate.so[0x2aaaacf1ff53] /usr/local/lib/glusterfs/3.3.0qa13/xlator/protocol/client.so(client3_1_inodelk_cbk+0x27d)[0x2aaaaccb54e0]
CHANGE: http://review.gluster.com/735 (Change-Id: I0d87f06f989b2d4b971967c52d4898331693a801) merged in master by Vijay Bellur (vijay)