Bug 1058340

Summary:

glusterd crashed after stopping and deleting the volume for number of iterations

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

SATHEESARAN <sasundar>

Component:

glusterd

Assignee:

Bug Updates Notification Mailing List <rhs-bugs>

Status:

CLOSED EOL

QA Contact:

SATHEESARAN <sasundar>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

2.1

CC:

bturner, nlevinki, vbellur

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-12-03 17:17:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
sosreport from RHSS Node1	none
sosreport from RHSS Node2	none

Description SATHEESARAN 2014-01-27 15:24:58 UTC

Description of problem:
-----------------------
glusterd crashed after stopping and deleting volume for number of iterations


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHSS Version - RHSS 2.1 Update2 RC1 ( glusterfs-3.4.0.57rhs-1.el6rhs )


How reproducible:
-----------------
Happened 4 times out of 4 tries


Steps to Reproduce:
-------------------

1. Create a volume ( any type )
2. Start the volume
3. Get the volume status
4. Stop the volume
5. Delete the volume
6. Repeat the above steps 1 to step 5, 
(i.e) create->start->get status->stop-> delete volume, for 10,000 iterations

Actual results:
--------------
In first two attempts, glusterd crash happened at 231 and 1431th iteration

Expected results:
-----------------
glusterd should not crash

Additional info:
----------------
This bug was already verified with 10,000 iterations for RHSS 2.1 (BigBend) -
glusterfs-3.4.0.33rhs-1.el6rhs
Ref - https://bugzilla.redhat.com/show_bug.cgi?id=962621

Point to make here are,
1. Use case wise - stopping, deleting the volume repeatedly was not ideal scenario at customer deployments, hence made the severity as MEDIUM, although its a CRASH
2. Its again a REGRESSION, but not adding REGRESSION Keyword, as this bug would be moved with BLOCKER? in that case

Comment 1 SATHEESARAN 2014-01-27 15:27:32 UTC

Crash error message in glusterd log file:

<snip>
[2014-01-27 07:30:18.199134] E [glusterd-utils.c:4007:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/16a505b2447a553ea7
e2fa2d26822ee3.socket error: Permission denied
[2014-01-27 07:30:18.199763] I [glusterd-utils.c:4041:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2014-01-27 07:30:18.200118] I [glusterd-utils.c:4046:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2014-01-27 07:30:18.200454] I [glusterd-utils.c:4051:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2014-01-27 07:30:18.200838] I [glusterd-utils.c:4056:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2014-01-27 07:30:18.201174] I [glusterd-utils.c:4061:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2014-01-27 07:30:18.201550] I [glusterd-utils.c:4066:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2014-01-27 07:30:18.331329] I [glusterd-ping.c:181:glusterd_start_ping] 0-management: defaulting ping-timeout to 10s
[2014-01-27 07:30:19.290937] I [glusterd-ping.c:297:glusterd_ping_cbk] 0-management: defaulting ping-timeout to 10s
[2014-01-27 07:30:20.155917] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/test2253 on port 51404
[2014-01-27 07:30:20.157642] I [rpc-clnt.c:1004:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-01-27 07:30:20configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.57gdcrash1
/lib64/libc.so.6(+0x32960)[0x7fa3fec99960]
/usr/lib64/libglusterfs.so.0(+0x5d6bb)[0x7fa3ffc446bb]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x7fa3ffbfbe25]
/usr/lib64/libglusterfs.so.0(xlator_options_validate_list+0x2f)[0x7fa3ffc414df]
/usr/lib64/libgfrpc.so.0(rpc_transport_load+0x3a3)[0x7fa3ff9da843]
/usr/lib64/libgfrpc.so.0(rpc_clnt_new+0x174)[0x7fa3ff9de014]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_rpc_create+0x66)[0x7fa3fb403cc6]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_connect+0x1ab)[0x7fa3fb42ae5b]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0x6bb)[0x7fa3fb430dab]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_start+0x119)[0x7fa3fb432389]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_start_volume+0xfd)[0x7fa3fb46c7cd]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x52b)[0x7fa3fb41bc7b]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xbe)[0x7fa3fb4785fe]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x2c2)[0x7fa3fb47a272]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7fa3fb47a3ab]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(__glusterd_handle_cli_start_volume+0x1b6)[0x7fa3fb46e2e6]
/usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa3fb40378f]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fa3ffc32172]
/lib64/libc.so.6(+0x43bb0)[0x7fa3fecaabb0]
---------

</snip>

Comment 2 SATHEESARAN 2014-01-27 15:36:38 UTC

Setup Information
==================
1. Trusted storage pool of 2 RHSS Nodes
[root@rhsauto032 ~]# uname -n
rhsauto032.lab.eng.blr.redhat.com

[root@rhsauto032 ~]# ip addr | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.70.37.7/23 brd 10.70.37.255 scope global eth0

[root@rhsauto034 ~]# uname -n
rhsauto034.lab.eng.blr.redhat.com

[root@rhsauto034 ~]# ip addr | grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.70.37.9/23 brd 10.70.37.255 scope global eth0

[root@rhsauto034 ~]# gluster pool list
UUID                                    Hostname        State
a8237b30-10bc-470d-817c-03be87714f33    10.70.37.7      Disconnected 
34b5c29c-a06d-4bd0-952c-152e5d7575fc    localhost       Connected

Comment 3 SATHEESARAN 2014-01-27 15:39:03 UTC

Script that was used for reproducing this issue :

for i in {1..10000}; 
    do echo "iteration $i" >> log;
    gluster volume create test 10.70.37.7:/rhs/brick1/test$i 10.70.37.9:/rhs/brick1/test$i >> log;
    if [ $? != 0 ]; then
        echo "error while creating the volume"
    fi
    
    gluster volume start test>>log; 
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while starting the volume" >> error
        `service glusterd status` >> error
    fi
    sleep 1;
    
    gluster volume status >> log; 
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while getting volume status"
        `service glusterd status` >> error
    fi
    
    gluster volume stop test --mode=script >>log;
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while stopping the volume"
        `service glusterd status` >> error
    fi
    
    gluster volume delete test --mode=script>>log;
    if [ $? != 0 ]; then
        echo "iteration $i" >> error
        echo "error while deleting the volume"
        `service glusterd status` >> error
    fi
done

Comment 4 SATHEESARAN 2014-01-27 15:44:21 UTC

Crash related information :

1. The script as in comment3 was executed from machine 10.70.37.7
2. glusterd crash was found in the same machine (10.70.37.7)

Comment 5 SATHEESARAN 2014-01-27 15:51:19 UTC

Created attachment 856128 [details]
sosreport from RHSS Node1

sosreport from RHSS Node 10.70.37.7

Comment 6 SATHEESARAN 2014-01-27 16:01:08 UTC

Created attachment 856131 [details]
sosreport from RHSS Node2

sosreport from RHSS Node2 - 10.70.37.9

Comment 8 Vivek Agarwal 2015-12-03 17:17:22 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.