Description of problem: ----------------------- glusterd crashed after stopping and deleting volume for number of iterations Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHSS Version - RHSS 2.1 Update2 RC1 ( glusterfs-3.4.0.57rhs-1.el6rhs ) How reproducible: ----------------- Happened 4 times out of 4 tries Steps to Reproduce: ------------------- 1. Create a volume ( any type ) 2. Start the volume 3. Get the volume status 4. Stop the volume 5. Delete the volume 6. Repeat the above steps 1 to step 5, (i.e) create->start->get status->stop-> delete volume, for 10,000 iterations Actual results: -------------- In first two attempts, glusterd crash happened at 231 and 1431th iteration Expected results: ----------------- glusterd should not crash Additional info: ---------------- This bug was already verified with 10,000 iterations for RHSS 2.1 (BigBend) - glusterfs-3.4.0.33rhs-1.el6rhs Ref - https://bugzilla.redhat.com/show_bug.cgi?id=962621 Point to make here are, 1. Use case wise - stopping, deleting the volume repeatedly was not ideal scenario at customer deployments, hence made the severity as MEDIUM, although its a CRASH 2. Its again a REGRESSION, but not adding REGRESSION Keyword, as this bug would be moved with BLOCKER? in that case
Crash error message in glusterd log file: <snip> [2014-01-27 07:30:18.199134] E [glusterd-utils.c:4007:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/16a505b2447a553ea7 e2fa2d26822ee3.socket error: Permission denied [2014-01-27 07:30:18.199763] I [glusterd-utils.c:4041:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2014-01-27 07:30:18.200118] I [glusterd-utils.c:4046:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2014-01-27 07:30:18.200454] I [glusterd-utils.c:4051:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2014-01-27 07:30:18.200838] I [glusterd-utils.c:4056:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2014-01-27 07:30:18.201174] I [glusterd-utils.c:4061:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2014-01-27 07:30:18.201550] I [glusterd-utils.c:4066:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2014-01-27 07:30:18.331329] I [glusterd-ping.c:181:glusterd_start_ping] 0-management: defaulting ping-timeout to 10s [2014-01-27 07:30:19.290937] I [glusterd-ping.c:297:glusterd_ping_cbk] 0-management: defaulting ping-timeout to 10s [2014-01-27 07:30:20.155917] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/test2253 on port 51404 [2014-01-27 07:30:20.157642] I [rpc-clnt.c:1004:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-01-27 07:30:20configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.57gdcrash1 /lib64/libc.so.6(+0x32960)[0x7fa3fec99960] /usr/lib64/libglusterfs.so.0(+0x5d6bb)[0x7fa3ffc446bb] /usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x7fa3ffbfbe25] /usr/lib64/libglusterfs.so.0(xlator_options_validate_list+0x2f)[0x7fa3ffc414df] /usr/lib64/libgfrpc.so.0(rpc_transport_load+0x3a3)[0x7fa3ff9da843] /usr/lib64/libgfrpc.so.0(rpc_clnt_new+0x174)[0x7fa3ff9de014] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_rpc_create+0x66)[0x7fa3fb403cc6] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_connect+0x1ab)[0x7fa3fb42ae5b] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0x6bb)[0x7fa3fb430dab] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_brick_start+0x119)[0x7fa3fb432389] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_start_volume+0xfd)[0x7fa3fb46c7cd] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_commit_perform+0x52b)[0x7fa3fb41bc7b] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_commit_op_phase+0xbe)[0x7fa3fb4785fe] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x2c2)[0x7fa3fb47a272] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x3b)[0x7fa3fb47a3ab] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(__glusterd_handle_cli_start_volume+0x1b6)[0x7fa3fb46e2e6] /usr/lib64/glusterfs/3.4.0.57gdcrash1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7fa3fb40378f] /usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fa3ffc32172] /lib64/libc.so.6(+0x43bb0)[0x7fa3fecaabb0] --------- </snip>
Setup Information ================== 1. Trusted storage pool of 2 RHSS Nodes [root@rhsauto032 ~]# uname -n rhsauto032.lab.eng.blr.redhat.com [root@rhsauto032 ~]# ip addr | grep eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 10.70.37.7/23 brd 10.70.37.255 scope global eth0 [root@rhsauto034 ~]# uname -n rhsauto034.lab.eng.blr.redhat.com [root@rhsauto034 ~]# ip addr | grep eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 10.70.37.9/23 brd 10.70.37.255 scope global eth0 [root@rhsauto034 ~]# gluster pool list UUID Hostname State a8237b30-10bc-470d-817c-03be87714f33 10.70.37.7 Disconnected 34b5c29c-a06d-4bd0-952c-152e5d7575fc localhost Connected
Script that was used for reproducing this issue : for i in {1..10000}; do echo "iteration $i" >> log; gluster volume create test 10.70.37.7:/rhs/brick1/test$i 10.70.37.9:/rhs/brick1/test$i >> log; if [ $? != 0 ]; then echo "error while creating the volume" fi gluster volume start test>>log; if [ $? != 0 ]; then echo "iteration $i" >> error echo "error while starting the volume" >> error `service glusterd status` >> error fi sleep 1; gluster volume status >> log; if [ $? != 0 ]; then echo "iteration $i" >> error echo "error while getting volume status" `service glusterd status` >> error fi gluster volume stop test --mode=script >>log; if [ $? != 0 ]; then echo "iteration $i" >> error echo "error while stopping the volume" `service glusterd status` >> error fi gluster volume delete test --mode=script>>log; if [ $? != 0 ]; then echo "iteration $i" >> error echo "error while deleting the volume" `service glusterd status` >> error fi done
Crash related information : 1. The script as in comment3 was executed from machine 10.70.37.7 2. glusterd crash was found in the same machine (10.70.37.7)
Created attachment 856128 [details] sosreport from RHSS Node1 sosreport from RHSS Node 10.70.37.7
Created attachment 856131 [details] sosreport from RHSS Node2 sosreport from RHSS Node2 - 10.70.37.9
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.