1058340 – glusterd crashed after stopping and deleting the volume for number of iterations

Bug 1058340 - glusterd crashed after stopping and deleting the volume for number of iterations

Summary: glusterd crashed after stopping and deleting the volume for number of iterations

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-27 15:24 UTC by SATHEESARAN
Modified:	2015-12-03 17:17 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-03 17:17:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport from RHSS Node1 (7.96 MB, application/x-xz) 2014-01-27 15:51 UTC, SATHEESARAN	no flags	Details
sosreport from RHSS Node2 (9.86 MB, application/x-xz) 2014-01-27 16:01 UTC, SATHEESARAN	no flags	Details
View All

Description SATHEESARAN 2014-01-27 15:24:58 UTC

Description of problem:
-----------------------
glusterd crashed after stopping and deleting volume for number of iterations


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHSS Version - RHSS 2.1 Update2 RC1 ( glusterfs-3.4.0.57rhs-1.el6rhs )


How reproducible:
-----------------
Happened 4 times out of 4 tries


Steps to Reproduce:
-------------------

1. Create a volume ( any type )
2. Start the volume
3. Get the volume status
4. Stop the volume
5. Delete the volume
6. Repeat the above steps 1 to step 5, 
(i.e) create->start->get status->stop-> delete volume, for 10,000 iterations

Actual results:
--------------
In first two attempts, glusterd crash happened at 231 and 1431th iteration

Expected results:
-----------------
glusterd should not crash

Additional info:
----------------
This bug was already verified with 10,000 iterations for RHSS 2.1 (BigBend) -
glusterfs-3.4.0.33rhs-1.el6rhs
Ref - https://bugzilla.redhat.com/show_bug.cgi?id=962621

Point to make here are,
1. Use case wise - stopping, deleting the volume repeatedly was not ideal scenario at customer deployments, hence made the severity as MEDIUM, although its a CRASH
2. Its again a REGRESSION, but not adding REGRESSION Keyword, as this bug would be moved with BLOCKER? in that case

Comment 1 SATHEESARAN 2014-01-27 15:27:32 UTC

Crash error message in glusterd log file:

<snip>
[2014-01-27 07:30:18.199134] E [glusterd-utils.c:4007:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/16a505b2447a553ea7
e2fa2d26822ee3.socket error: Permission denied
[2014-01-27 07:30:18.199763] I [glusterd-utils.c:4041:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2014-01-27 07:30:18.200118] I [glusterd-utils.c:4046:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2014-01-27 07:30:18.200454] I [glusterd-utils.c:4051:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2014-01-27 07:30:18.200838] I [glusterd-utils.c:4056:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2014-01-27 07:30:18.201174] I [glusterd-utils.c:4061:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2014-01-27 07:30:18.201550] I [glusterd-utils.c:4066:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2014-01-27 07:30:18.331329] I [glusterd-ping.c:181:glusterd_start_ping] 0-management: defaulting ping-timeout to 10s
[2014-01-27 07:30:19.290937] I [glusterd-ping.c:297:glusterd_ping_cbk] 0-management: defaulting ping-timeout to 10s
[2014-01-27 07:30:20.155917] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/test2253 on port 51404
[2014-01-27 07:30:20.157642] I [rpc-clnt.c:1004:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-01-27 07:30:20configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.57gdcrash1
/lib64/libc.so.6(+0x32960)[0x7fa3fec99960]
/usr/lib64/libglusterfs.so.0(+0x5d6bb)[0x7fa3ffc446bb]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x7fa3ffbfbe25]
/usr/lib64/libglusterfs.so.0(xlator_options_validate_list+0x2f)[0x7fa3ffc414df]
/usr/lib64/libgfrpc.so.0(rpc_transport_load+0x3a3)[0x7fa3ff9da843]
/usr/lib64/libgfrpc.so.0(rpc_clnt_new+0x174)[0x7fa3ff9de014]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_rpc_create+0x66)[0x7fa3fb403cc6]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_connect+0x1ab)[0x7fa3fb42ae5b]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0x6bb)[0x7fa3fb430dab]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_start+0x119)[0x7fa3fb432389]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_start_volume+0xfd)[0x7fa3fb46c7cd]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x52b)[0x7fa3fb41bc7b]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xbe)[0x7fa3fb4785fe]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x2c2)[0x7fa3fb47a272]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7fa3fb47a3ab]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(__glusterd_handle_cli_start_volume+0x1b6)[0x7fa3fb46e2e6]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa3fb40378f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fa3ffc32172]
/lib64/libc.so.6(+0x43bb0)[0x7fa3fecaabb0]
---------

</snip>

Comment 2 SATHEESARAN 2014-01-27 15:36:38 UTC

Setup Information
==================
1. Trusted storage pool of 2 RHSS Nodes
[root@rhsauto032 ~]# uname -n
rhsauto032.lab.eng.blr.redhat.com

[root@rhsauto032 ~]# ip addr | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.70.37.7/23 brd 10.70.37.255 scope global eth0

[root@rhsauto034 ~]# uname -n
rhsauto034.lab.eng.blr.redhat.com

[root@rhsauto034 ~]# ip addr | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.70.37.9/23 brd 10.70.37.255 scope global eth0

[root@rhsauto034 ~]# gluster pool list
UUID                                    Hostname        State
a8237b30-10bc-470d-817c-03be87714f33    10.70.37.7      Disconnected 
34b5c29c-a06d-4bd0-952c-152e5d7575fc    localhost       Connected

Comment 3 SATHEESARAN 2014-01-27 15:39:03 UTC

Script that was used for reproducing this issue :

for i in {1..10000}; 
    do echo "iteration $i" >> log;
    gluster volume create test 10.70.37.7:/rhs/brick1/test$i 10.70.37.9:/rhs/brick1/test$i >> log;
    if [ $? != 0 ]; then
        echo "error while creating the volume"
    fi
    
    gluster volume start test>>log; 
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while starting the volume" >> error
        `service glusterd status` >> error
    fi
    sleep 1;
    
    gluster volume status >> log; 
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while getting volume status"
        `service glusterd status` >> error
    fi
    
    gluster volume stop test --mode=script >>log;
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while stopping the volume"
        `service glusterd status` >> error
    fi
    
    gluster volume delete test --mode=script>>log;
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while deleting the volume"
        `service glusterd status` >> error
    fi
done

Comment 4 SATHEESARAN 2014-01-27 15:44:21 UTC

Crash related information :

1. The script as in comment3 was executed from machine 10.70.37.7
2. glusterd crash was found in the same machine (10.70.37.7)

Comment 5 SATHEESARAN 2014-01-27 15:51:19 UTC

Created attachment 856128 [details]
sosreport from RHSS Node1

sosreport from RHSS Node 10.70.37.7

Comment 6 SATHEESARAN 2014-01-27 16:01:08 UTC

Created attachment 856131 [details]
sosreport from RHSS Node2

sosreport from RHSS Node2 - 10.70.37.9

Comment 8 Vivek Agarwal 2015-12-03 17:17:22 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.