1197631 – glusterd crashed after peer probe

Bug 1197631 - glusterd crashed after peer probe

Summary: glusterd crashed after peer probe

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Assignee:	Anand Nekkunti
QA Contact:
Docs Contact:
URL:
Whiteboard:	GlusterD
Depends On:
Blocks:	qe_tracker_everglades
TreeView+	depends on / blocked

Reported:	2015-03-02 09:25 UTC by Bhaskarakiran
Modified:	2016-11-23 23:12 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-03-30 03:43:12 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
corefile of the crashed machine (565.11 KB, application/zip) 2015-03-02 09:25 UTC, Bhaskarakiran	no flags	Details
sosreport of Node1 (7.15 MB, application/x-xz) 2015-03-02 09:26 UTC, Bhaskarakiran	no flags	Details
sosreport of Node2 (6.71 MB, application/x-xz) 2015-03-02 09:27 UTC, Bhaskarakiran	no flags	Details
View All

Description Bhaskarakiran 2015-03-02 09:25:12 UTC

Created attachment 997003 [details]
corefile of the crashed machine

Description of problem:
======================
glusterd crashed after a peer probe. Steps that i followed :
1. Removed all the existing gluster packages using yum
2. Installed with "yum localinstall" the packages which are downloaded from:
http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/epel-6-x86_64/glusterfs-3.7dev-0.577.gitf18a3f3.autobuild/
3. Tried to create a disperse 1x(8+4) volume and gave out error message that host is not connected
4. Then peer probed the partner and peer status.
5. started the glusterd and then peer status
6. volume create with force option and is successful

Version-Release number of selected component (if applicable):
=============================================================
[root@vertigo /]# gluster --version
glusterfs 3.7dev built on Mar  1 2015 01:03:38
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@vertigo /]# 


How reproducible:
==================
Tried only once


Steps to Reproduce:
===================
As mentioned in the description

Actual results:
===============
Glusterd crashed


Expected results:
=================
No crash should be seen


Additional info:
================

Attaching the crash file and sosreports of both nodes.

Comment 1 Bhaskarakiran 2015-03-02 09:26:19 UTC

Created attachment 997004 [details]
sosreport of Node1

Comment 2 Bhaskarakiran 2015-03-02 09:27:16 UTC

Created attachment 997005 [details]
sosreport of Node2

Comment 3 Anand Nekkunti 2015-03-09 08:20:25 UTC

Above link for rpm is 8 Feb 2015  but glusterd version showing 01 Mar 2015  
============================================
[root@vertigo /]# gluster --version
glusterfs 3.7dev built on Mar  1 2015 01:03:38 

rpms for 01-mar-2015 is : 
http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/epel-6-x86_64/glusterfs-3.7dev-0.627.git32dd227.autobuild/


Back trace  from core dump :

Loaded symbols for /lib64/libnss_dns-2.12.so
Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'.
Program terminated with signal 7, Bus error.
#0  __gf_free (free_ptr=0x7fb4c8000d90) at mem-pool.c:261
261	        if (!xl->mem_acct.rec) {
(gdb) bt
#0  __gf_free (free_ptr=0x7fb4c8000d90) at mem-pool.c:261
#1  0x00007fb4e31553a5 in data_destroy (data=0x7fb4e180fbe0) at dict.c:148
#2  0x00007fb4e315561e in dict_get_str (this=<value optimized out>, key=<value optimized out>, str=0x7fb4c4203178) at dict.c:2097
#3  0x00007fb4d8e7465a in glusterd_xfer_cli_probe_resp (req=0x19fe35c, op_ret=-1, op_errno=0, op_errstr=0x0, hostname=0x7fb4c8000d50 "", port=24007, dict=0x7fb4e19f1538) at glusterd-handler.c:3455
#4  0x00007fb4d8e75442 in __glusterd_handle_cli_probe (req=0x19fe35c) at glusterd-handler.c:1056
#5  0x00007fb4d8e6064f in glusterd_big_locked_handler (req=0x19fe35c, actor_fn=0x7fb4d8e75090 <__glusterd_handle_cli_probe>) at glusterd-handler.c:82
#6  0x00007fb4e3199502 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375
#7  0x0000003a38c438f0 in ?? () from /lib64/libc-2.12.so
#8  0x0000000000000000 in ?? ()
(gdb) list 
256	        memcpy (&xl, ptr, sizeof(xlator_t *));
257	
258	        //gf_free expects xl to be available
259	        GF_ASSERT (xl != NULL);
260	
261	        if (!xl->mem_acct.rec) {
262	                ptr = (char *)free_ptr - GF_MEM_HEADER_SIZE;
263	                goto free;
264	        }
265	



 I am not able to reproduce this bug as mention above steps, 
Please can explain when it happen or can you reproduce it , Based on 
core and sos report  I found that glusterd was killed  due to SIGBUS 
while accessing the dictionary . I am not able to figure out what could 
be the problem ,but  I found it is happen due to memory  alignment issue 
, not related to peer probe .

As per discussed with Bhaskar , It is happening one time during nfs related bug(
https://bugzilla.redhat.com/show_bug.cgi?id=1196546) .

Need more info to analyse this bug ..........

Comment 4 Atin Mukherjee 2015-03-11 04:29:34 UTC

Bhaskarakiran,

From the analysis it seems like the executables and the rpms referred are not the same. We wouldn't be able to analyse this with the current state. If this doesn't get reproduced, could you please close this bug saying incorrect setup?

Thanks,
Atin

Comment 5 Bhaskarakiran 2015-03-11 06:42:19 UTC

Atin,

I had pointed to the wrong nightly builds by mistake but the correct ones are mentioned by Anand. I will try to reproduce if not can close saying not-reproducible.

Thanks,
Bhaskarakiran.

Comment 6 Atin Mukherjee 2015-03-30 03:43:12 UTC

Closing this bug as its not reproducible. Kindly re-open if it happens to hit the same problem.

Note You need to log in before you can comment on or make changes to this bug.