Bug 1218553 - [Bitrot]: glusterd crashed when node was rebooted
Summary: [Bitrot]: glusterd crashed when node was rebooted
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1224242
TreeView+ depends on / blocked
 
Reported: 2015-05-05 09:05 UTC by senaik
Modified: 2016-07-19 10:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1224242 (view as bug list)
Environment:
Last Closed: 2016-07-19 10:39:49 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description senaik 2015-05-05 09:05:57 UTC
Description of problem:
=======================
Rebooting node while scheduler was creating snaphots on volume which has bit rot enabled resulted in glusterd crash. 


Version-Release number of selected component (if applicable):
=============================================================
 gluster --version
glusterfs 3.7.0beta1 built on May  1 2015

How reproducible:
================
1/1


Steps to Reproduce:
===================
1.Create a dist-rep volume , disperse volume(4 redundant bricks)and a distribute volume 
 
2.Enable USS, quota and bitrot on all the volumes

3.Add a job which creates snapshots every 5 mins on the volumes 

4. While it is in progress , reboot rhs-arch-srv2.lab.eng.blr.redhat.com

5.Check gluster status after the node comes back.

2015-05-05 07:16:54
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.0beta1
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3b7c821e16]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3b7c83db2f]
/lib64/libc.so.6[0x3a964326a0]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_bitdsvc_manager+0x56)[0x7f31e44b34c6]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_svcs_manager+0xdd)[0x7f31e44b1bed]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_compare_friend_data+0x332)[0x7f31e44276f2]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(+0x4f9f3)[0x7f31e43fe9f3]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x170)[0x7f31e43ff6e0]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(__glusterd_handle_incoming_friend_req+0x232)[0x7f31e43fdad2]
/usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x3f)[0x7f31e43e4ebf]
/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295)[0x3b7cc09c85]
/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x3b7cc09ec3]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3b7cc0b7b8]
/usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so(+0x9bcd)[0x7f31e3411bcd]
/usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so(+0xb6fd)[0x7f31e34136fd]
/usr/lib64/libglusterfs.so.0[0x3b7c87d4b0]
/lib64/libpthread.so.0[0x3a968079d1]
/lib64/libc.so.6(clone+0x6d)[0x3a964e89dd]

bt:
===

Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f31e44b34c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-3.7.0beta1-0.3.git7aeae00.el6.x86_64
(gdb) bt
#0  0x00007f31e44b34c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#1  0x00007f31e44b1bed in glusterd_svcs_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#2  0x00007f31e44276f2 in glusterd_compare_friend_data ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#3  0x00007f31e43fe9f3 in ?? () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#4  0x00007f31e43ff6e0 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#5  0x00007f31e43fdad2 in __glusterd_handle_incoming_friend_req ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#6  0x00007f31e43e4ebf in glusterd_big_locked_handler ()
   from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#7  0x0000003b7cc09c85 in rpcsvc_handle_rpc_call () from /usr/lib64/libgfrpc.so.0
#8  0x0000003b7cc09ec3 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#9  0x0000003b7cc0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#10 0x00007f31e3411bcd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#11 0x00007f31e34136fd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#12 0x0000003b7c87d4b0 in ?? () from /usr/lib64/libglusterfs.so.0
#13 0x0000003a968079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003a964e89dd in clone () from /lib64/libc.so.6



Actual results:


Expected results:


Additional info:

Comment 2 senaik 2015-05-05 11:24:57 UTC
Version :
=======
gluster --version
glusterfs 3.7.0beta1 built on May  1 2015

Faced a similar crash while attaching another node to the cluster where volumes had bit rot enabled.

[root@inception ~]# gluster peer probe  snapshot11.lab.eng.blr.redhat.com
peer probe: failed: Probe returned with unknown errno -1


(gdb) bt
#0  0x00007f750ca2a4c6 in glusterd_bitdsvc_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#1  0x00007f750ca28bed in glusterd_svcs_manager () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#2  0x00007f750c99e6f2 in glusterd_compare_friend_data () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#3  0x00007f750c9759f3 in ?? () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#4  0x00007f750c9766e0 in glusterd_friend_sm () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#5  0x00007f750c974ad2 in __glusterd_handle_incoming_friend_req () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#6  0x00007f750c95bebf in glusterd_big_locked_handler () from /usr/lib64/glusterfs/3.7.0beta1/xlator/mgmt/glusterd.so
#7  0x00000031f6c09c85 in rpcsvc_handle_rpc_call () from /usr/lib64/libgfrpc.so.0
#8  0x00000031f6c09ec3 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#9  0x00000031f6c0b7b8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#10 0x00007f750bbd9bcd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#11 0x00007f750bbdb6fd in ?? () from /usr/lib64/glusterfs/3.7.0beta1/rpc-transport/socket.so
#12 0x00000031f647d4b0 in ?? () from /usr/lib64/libglusterfs.so.0
#13 0x00000032284079d1 in start_thread () from /lib64/libpthread.so.0
#14 0x00000032280e88fd in clone () from /lib64/libc.so.6

Comment 3 Venky Shankar 2015-05-06 03:57:38 UTC
Gaurav,

Mind having a look at this?

Comment 4 Gaurav Kumar Garg 2015-05-18 06:47:26 UTC
patch    http://review.gluster.org/#/c/10664/ should fix this issue. If you find the issue again please reopen this bug. hence moving the status of bug close.


Note You need to log in before you can comment on or make changes to this bug.