Bug 1226254

Summary: Glusterd crash
Product: [Community] GlusterFS Reporter: Felix <felix.delelisdd>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.1CC: amukherj, bugs, felix.delelisdd, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-12 05:11:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport
none
File1
none
File2
none
File 3
none
File 4
none
File 5
none
File 6
none
File 7
none
File 8
none
File 9
none
File 10
none
File 11
none
File 12
none
File 13
none
File 14
none
File 15
none
Glusterd log
none
Cli log
none
Glustershd log
none
cmd history none

Description Felix 2015-05-29 10:04:33 UTC
Description of problem:

Hi,

I have a cluster with 3 nodes on pre-production. Yesterday, one node was down. The errror that I have seen is that:


[2015-05-28 19:04:27.305560] E [glusterd-syncop.c:1578:gd_sync_task_begin] 0-management: Unable to acquire lock for cfe-gv1
The message "I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 5 times between [2015-05-28 19:04:09.346088] and [2015-05-28 19:04:24.349191]
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-05-28 19:04:27
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.1
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd86e2f1232]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd86e30871d]
/usr/lib64/libc.so.6(+0x35640)[0x7fd86d30c640]
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_remove_pending_entry+0x2c)[0x7fd85f52450c]
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x5ae28)[0x7fd85f511e28]
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x237)[0x7fd85f50f027]
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_brick_op_cbk+0x2fe)[0x7fd85f53be5e]
/usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fd85f53d48c]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fd86e0c50b0]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x171)[0x7fd86e0c5321]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fd86e0c1273]
/usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x8530)[0x7fd85d17d530]
/usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0xace4)[0x7fd85d17fce4]
/usr/lib64/libglusterfs.so.0(+0x76322)[0x7fd86e346322]
/usr/sbin/glusterd(main+0x502)[0x7fd86e79afb2]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd86d2f8af5]
/usr/sbin/glusterd(+0x6351)[0x7fd86e79b351]
---------

Version-Release number of selected component (if applicable):

6.3.1

How reproducible:



Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Felix 2015-05-29 10:30:56 UTC
Created attachment 1031880 [details]
sosreport

Comment 2 Felix 2015-05-29 10:50:24 UTC
Created attachment 1031897 [details]
File1

Comment 3 Felix 2015-05-29 10:59:38 UTC
Created attachment 1031909 [details]
File2

Comment 4 Felix 2015-05-29 11:04:35 UTC
Created attachment 1031921 [details]
File 3

Comment 5 Felix 2015-05-29 11:10:22 UTC
Created attachment 1031935 [details]
File 4

Comment 6 Felix 2015-05-29 11:17:15 UTC
Created attachment 1031949 [details]
File 5

Comment 7 Felix 2015-05-29 11:18:56 UTC
Created attachment 1031952 [details]
File 6

Comment 8 Felix 2015-05-29 11:20:58 UTC
Created attachment 1031953 [details]
File 7

Comment 9 Felix 2015-05-29 11:25:57 UTC
Created attachment 1031982 [details]
File 8

Comment 10 Felix 2015-05-29 11:30:35 UTC
Created attachment 1032015 [details]
File 9

Comment 11 Felix 2015-05-29 11:35:28 UTC
Created attachment 1032017 [details]
File 10

Comment 12 Felix 2015-05-29 11:39:59 UTC
Created attachment 1032018 [details]
File 11

Comment 13 Felix 2015-05-29 11:43:22 UTC
Created attachment 1032019 [details]
File 12

Comment 14 Felix 2015-05-29 11:46:59 UTC
Created attachment 1032022 [details]
File 13

Comment 15 Felix 2015-05-29 11:54:07 UTC
Created attachment 1032034 [details]
File 14

Comment 16 Felix 2015-05-29 11:58:58 UTC
Created attachment 1032036 [details]
File 15

Comment 17 Felix 2015-06-01 09:34:49 UTC
Created attachment 1033229 [details]
Glusterd log

Comment 18 Felix 2015-06-01 09:37:34 UTC
Created attachment 1033231 [details]
Cli log

Comment 19 Felix 2015-06-01 10:25:26 UTC
Created attachment 1033249 [details]
Glustershd log

Comment 20 Atin Mukherjee 2015-06-01 10:28:33 UTC
Please attach the core file and mention the steps performed to hit the crash.

Comment 21 Felix 2015-06-01 10:30:08 UTC
Created attachment 1033252 [details]
cmd history

Comment 22 Atin Mukherjee 2015-06-01 11:48:35 UTC
The problem what I see here is concurrent volume status transactions were run at a given point of time. 3.6.1 has some fixes missing to take care of the issues identified on the same line. If you upgrade your cluster to 3.6.3 beta version the problem will go away. However 3.6.3 still misses one more fix http://review.gluster.org/#/c/10023/ which will be released in 3.6.4.

I would request you to upgrade your cluster to 3.6.3 if not 3.7.

Comment 23 Atin Mukherjee 2015-06-02 04:06:53 UTC
Could you upgrade your cluster and check if this problem goes away, if so then mind to close this bug?

Comment 24 Atin Mukherjee 2015-08-12 05:11:44 UTC
Since the reported hasn't gotten back with updates closing it, feel free to reopen if the problem persists.

Comment 25 Red Hat Bugzilla 2023-09-14 02:59:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days