Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1065501 - AFR: Crash on client when creating files for self heal of 50k files testcase.
AFR: Crash on client when creating files for self heal of 50k files testcase.
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
2.1
All Linux
high Severity high
: ---
: RHGS 3.0.3
Assigned To: Anuradha
Ben Turner
: ZStream
Depends On:
Blocks: 1035040 1162694
  Show dependency treegraph
 
Reported: 2014-02-14 14:39 EST by Ben Turner
Modified: 2016-09-19 22:00 EDT (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.6.0.28-1
Doc Type: Known Issue
Doc Text:
While self-heal is in progress on a mount, the mount may crash if cluster.data-self-heal is changed from "off" to "on" using volume set operation. Workaround: Ensure that no self-heals are required on the volume before changing the cluster.data-self-heal.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-01-15 08:37:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sosreport from client. (659.68 KB, application/x-xz)
2014-02-14 14:44 EST, Ben Turner
no flags Details
core (112 bytes, text/plain)
2014-02-14 14:44 EST, Ben Turner
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0038 normal SHIPPED_LIVE Red Hat Storage 3.0 enhancement and bug fix update #3 2015-01-15 13:35:28 EST

  None (edit)
Description Ben Turner 2014-02-14 14:39:14 EST
Description of problem:

When running the testcase "Test self-heal of 50k files (self-heal-daemon)" there was a crash when creating data.  Here is what I saw in the shell:

32768 bytes (33 kB) copied, 0.00226773 s, 14.4 MB/s
1+0 records in
1+0 records out
32768 bytes (33 kB) copied, 0.00233886 s, 14.0 MB/s
dd: opening `/gluster-mount/small/37773.small': Software caused connection abort
dd: opening `/gluster-mount/small/37774.small': Transport endpoint is not connected
dd: opening `/gluster-mount/small/37775.small': Transport endpoint is not connected

And in the gluster mount logs:

client-0 to healtest-client-1,  metadata - Pending matrix:  [ [ 0 2 ] [ 0 0 ] ], on /small/37757.small
[2014-02-14 18:56:15.169667] I [afr-self-heal-common.c:2906:afr_log_self_heal_completion_status] 0-healtest-replicate-0:  metadata self heal  is successfully completed,   metadata self heal from source healtest-client-0 to healtest-client-1,  metadata - Pending matrix:  [ [ 0 2 ] [ 0 0 ] ], on /small/37771.small
[2014-02-14 18:56:15.275690] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-02-14 18:56:15.276117] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2014-02-14 18:56:15.278740] I [dht-shared.c:311:dht_init_regex] 0-healtest-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-02-14 18:56:15.278975] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2014-02-14 18:56:15.279009] I [glusterfsd-mgmt.c:1379:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-02-14 18:56:15configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.59rhs
/lib64/libc.so.6(+0x32920)[0x7fd0fb464920]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_lock_rec+0x77)[0x7fd0f53a9a27]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/cluster/replicate.so(afr_sh_data_open_cbk+0x178)[0x7fd0f53ab398]
/usr/lib64/glusterfs/3.4.0.59rhs/xlator/protocol/client.so(client3_3_open_cbk+0x18b)[0x7fd0f560e82b]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fd0fc1a7f45]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fd0fc1a9507]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7fd0fc1a4d88]
/usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0x8d86)[0x7fd0f7a44d86]
/usr/lib64/glusterfs/3.4.0.59rhs/rpc-transport/socket.so(+0xa69d)[0x7fd0f7a4669d]
/usr/lib64/libglusterfs.so.0(+0x61ad7)[0x7fd0fc413ad7]
/usr/sbin/glusterfs(main+0x5f8)[0x4068b8]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd0fb450cdd]
/usr/sbin/glusterfs[0x4045c9]
---------


Version-Release number of selected component (if applicable):

glusterfs 3.4.0.59rhs

How reproducible:

I have only seen this crash once out of 2-3 runs of this or very similar testcases. 

Steps to Reproduce:

I hit this during a batch run of automated testcases, they were:

TCMS - 198855 223406 226909 226912 237832 238530 238539

The testcase that saw the crash was 238530, 

1.  Create a 1x2 volume across 2 nodes.

2.  Set volume option 'self-heal-daemon'  to value “off” using the command “gluster volume set <vol_name> self-heal-daemon off” from one of the storage node. 
 
3.  Bring down all bricks processes offline on a node.
 
4.  Create 50k files with:

            mkdir -p $MOUNT-POINT/small
            for i in `seq 1 $3`; do
                dd if=/dev/zero of=$MOUNT-POINT/small/$i.small bs=$4 count=1
            done


Actual results:

Crash on the client during file creation.

Expected results:

No crash.

Additional info:

I was only able to get the core file and sosreport from the client before the hosts were reclaimed.  I'll attempt to repro again for more data.
Comment 1 Ben Turner 2014-02-14 14:44:31 EST
Created attachment 863404 [details]
sosreport from client.
Comment 2 Ben Turner 2014-02-14 14:44:54 EST
Created attachment 863405 [details]
core
Comment 8 Shalaka 2014-02-18 06:20:25 EST
Please review the edited doc text and sign off.
Comment 10 Anuradha 2014-10-27 03:16:04 EDT
This bug was fixed as part of a rebase for Denali.
Comment 11 Ben Turner 2014-12-15 14:07:57 EST
Verified on glusterfs-3.6.0.38-1.
Comment 13 errata-xmlrpc 2015-01-15 08:37:10 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0038.html

Note You need to log in before you can comment on or make changes to this bug.