Description of problem: When running the testcase "Test self-heal of 50k files (self-heal-daemon)" there was a crash when creating data. Here is what I saw in the shell: 32768 bytes (33 kB) copied, 0.00226773 s, 14.4 MB/s 1+0 records in 1+0 records out 32768 bytes (33 kB) copied, 0.00233886 s, 14.0 MB/s dd: opening `/gluster-mount/small/37773.small': Software caused connection abort dd: opening `/gluster-mount/small/37774.small': Transport endpoint is not connected dd: opening `/gluster-mount/small/37775.small': Transport endpoint is not connected And in the gluster mount logs: client-0 to healtest-client-1, metadata - Pending matrix: [ [ 0 2 ] [ 0 0 ] ], on /small/37757.small [2014-02-14 18:56:15.169667] I [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 0-healtest-replicate-0: metadata self heal is successfully completed, metadata self heal from source healtest-client-0 to healtest-client-1, metadata - Pending matrix: [ [ 0 2 ] [ 0 0 ] ], on /small/37771.small [2014-02-14 18:56:15.275690] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2014-02-14 18:56:15.276117] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed [2014-02-14 18:56:15.278740] I [dht-shared.c:311:dht_init_regex] 0-healtest-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2014-02-14 18:56:15.278975] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2014-02-14 18:56:15.279009] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-02-14 18:56:15configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.59rhs /lib64/libc.so.6(+0x32920)[0x7fd0fb464920] /usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_lock_rec+0x77)[0x7fd0f53a9a27] /usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_open_cbk+0x178)[0x7fd0f53ab398] /usr/lib64/glusterfs/3.4.0.59rhs/xlator/protocol/client.so(client3_3_open_cbk+0x18b)[0x7fd0f560e82b] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fd0fc1a7f45] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fd0fc1a9507] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7fd0fc1a4d88] /usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0x8d86)[0x7fd0f7a44d86] /usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0xa69d)[0x7fd0f7a4669d] /usr/lib64/libglusterfs.so.0(+0x61ad7)[0x7fd0fc413ad7] /usr/sbin/glusterfs(main+0x5f8)[0x4068b8] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd0fb450cdd] /usr/sbin/glusterfs[0x4045c9] --------- Version-Release number of selected component (if applicable): glusterfs 3.4.0.59rhs How reproducible: I have only seen this crash once out of 2-3 runs of this or very similar testcases. Steps to Reproduce: I hit this during a batch run of automated testcases, they were: TCMS - 198855 223406 226909 226912 237832 238530 238539 The testcase that saw the crash was 238530, 1. Create a 1x2 volume across 2 nodes. 2. Set volume option 'self-heal-daemon' to value “off” using the command “gluster volume set <vol_name> self-heal-daemon off” from one of the storage node. 3. Bring down all bricks processes offline on a node. 4. Create 50k files with: mkdir -p $MOUNT-POINT/small for i in `seq 1 $3`; do dd if=/dev/zero of=$MOUNT-POINT/small/$i.small bs=$4 count=1 done Actual results: Crash on the client during file creation. Expected results: No crash. Additional info: I was only able to get the core file and sosreport from the client before the hosts were reclaimed. I'll attempt to repro again for more data.
Created attachment 863404 [details] sosreport from client.
Created attachment 863405 [details] core
Please review the edited doc text and sign off.
This bug was fixed as part of a rebase for Denali.
Verified on glusterfs-3.6.0.38-1.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html