Bug 1218553

Summary: [Bitrot]: glusterd crashed when node was rebooted
Product: [Community] GlusterFS Reporter: senaik
Component: bitrotAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact: bugs <bugs>
Priority: unspecified    
Version: mainlineCC: annair, bugs, rabhat, rjoseph, vshankar
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1224242 (view as bug list) Environment:
Last Closed: 2016-07-19 10:39:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1224242    

Description senaik 2015-05-05 09:05:57 UTC
Description of problem:
=======================
Rebooting node while scheduler was creating snaphots on volume which has bit rot enabled resulted in glusterd crash. 


Version-Release number of selected component (if applicable):
=============================================================
 gluster --version
glusterfs 3.7.0beta1 built on May  1 2015

How reproducible:
================
1/1


Steps to Reproduce:
===================
1.Create a dist-rep volume , disperse volume(4 redundant bricks)and a distribute volume 
 
2.Enable USS, quota and bitrot on all the volumes

3.Add a job which creates snapshots every 5 mins on the volumes 

4. While it is in progress , reboot rhs-arch-srv2.lab.eng.blr.redhat.com

5.Check gluster status after the node comes back.

2015-05-05 07:16:54
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.0beta1
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3b7c821e16]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3b7c83db2f]
/lib64/libc.so.6[0x3a964326a0]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_bitdsvc_manager+0x56)[0x7f31e44b34c6]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_svcs_manager+0xdd)[0x7f31e44b1bed]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_compare_friend_data+0x332)[0x7f31e44276f2]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(+0x4f9f3)[0x7f31e43fe9f3]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x170)[0x7f31e43ff6e0]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(__glusterd_handle_incoming_friend_req+0x232)[0x7f31e43fdad2]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f31e43e4ebf]
/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295)[0x3b7cc09c85]
/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x3b7cc09ec3]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3b7cc0b7b8]
/usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so(+0x9bcd)[0x7f31e3411bcd]
/usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so(+0xb6fd)[0x7f31e34136fd]
/usr/lib64/libglusterfs.so.0[0x3b7c87d4b0]
/lib64/libpthread.so.0[0x3a968079d1]
/lib64/libc.so.6(clone+0x6d)[0x3a964e89dd]

bt:
===

Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f31e44b34c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0beta1-0.3.git7aeae00.el6.x86_64
(gdb) bt
#0  0x00007f31e44b34c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#1  0x00007f31e44b1bed in glusterd_svcs_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#2  0x00007f31e44276f2 in glusterd_compare_friend_data ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#3  0x00007f31e43fe9f3 in ?? () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#4  0x00007f31e43ff6e0 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#5  0x00007f31e43fdad2 in __glusterd_handle_incoming_friend_req ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#6  0x00007f31e43e4ebf in glusterd_big_locked_handler ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#7  0x0000003b7cc09c85 in rpcsvc_handle_rpc_call () from /usr/lib64/libgfrpc.so.0
#8  0x0000003b7cc09ec3 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#9  0x0000003b7cc0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#10 0x00007f31e3411bcd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#11 0x00007f31e34136fd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#12 0x0000003b7c87d4b0 in ?? () from /usr/lib64/libglusterfs.so.0
#13 0x0000003a968079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003a964e89dd in clone () from /lib64/libc.so.6



Actual results:


Expected results:


Additional info:

Comment 2 senaik 2015-05-05 11:24:57 UTC
Version :
=======
gluster --version
glusterfs 3.7.0beta1 built on May  1 2015

Faced a similar crash while attaching another node to the cluster where volumes had bit rot enabled.

[root@inception ~]# gluster peer probe  snapshot11.lab.eng.blr.redhat.com
peer probe: failed: Probe returned with unknown errno -1


(gdb) bt
#0  0x00007f750ca2a4c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#1  0x00007f750ca28bed in glusterd_svcs_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#2  0x00007f750c99e6f2 in glusterd_compare_friend_data () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#3  0x00007f750c9759f3 in ?? () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#4  0x00007f750c9766e0 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#5  0x00007f750c974ad2 in __glusterd_handle_incoming_friend_req () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#6  0x00007f750c95bebf in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#7  0x00000031f6c09c85 in rpcsvc_handle_rpc_call () from /usr/lib64/libgfrpc.so.0
#8  0x00000031f6c09ec3 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#9  0x00000031f6c0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#10 0x00007f750bbd9bcd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#11 0x00007f750bbdb6fd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#12 0x00000031f647d4b0 in ?? () from /usr/lib64/libglusterfs.so.0
#13 0x00000032284079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00000032280e88fd in clone () from /lib64/libc.so.6

Comment 3 Venky Shankar 2015-05-06 03:57:38 UTC
Gaurav,

Mind having a look at this?

Comment 4 Gaurav Kumar Garg 2015-05-18 06:47:26 UTC
patch    http://review.gluster.org/#/c/10664/ should fix this issue. If you find the issue again please reopen this bug. hence moving the status of bug close.