Bug 852979

Summary:	Glusterfsd process crashed due to memory curruption.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vijaykumar Koppad <vkoppad>
Component:	glusterd	Assignee:	Raghavendra Bhat <rabhat>
Status:	CLOSED WORKSFORME	QA Contact:	Vijaykumar Koppad <vkoppad>
Severity:	medium	Docs Contact:
Priority:	high
Version:	2.0	CC:	amarts, bbandari, rhs-bugs, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0qa4	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	858490 (view as bug list)		Environment:
Last Closed:	2012-12-04 09:45:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	858490

Description Vijaykumar Koppad 2012-08-30 07:30:54 UTC

Description of problem: while doing kill -HUP <brick PID> , the glusterfsd process dumped the core. The log rotate was successful , later it crashed the process. 


Version-Release number of selected component (if applicable): RHS-2.0.z 


How reproducible: May be a corner case. 


Steps to Reproduce:
 Steps I did. 
1.Create distribut-replicate volume 
2.Run creating and deletion of file for some 2 weeks.
3.kill -HUP to one of brick process.( this should be done while heavy test running) 
4.If you lucky you might get a core.  
  
Actual results: The process Crashed 


Expected results:The process shouldn't crash 


Additional info:
###############################################################
Back trace in the log file :
###############################################################
This is what was left in the log file when rotated the log. 

[2012-08-30 06:58:46.901737] I [glusterfsd-mgmt.c:1564:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2012-08-30 06:58:46.050422] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-master-marker: cannot add a new contribution node
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-08-30 06:58:46
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0rhs
/lib64/libc.so.6[0x3d9fe32900]
/lib64/libc.so.6(gsignal+0x35)[0x3d9fe32885]
/lib64/libc.so.6(abort+0x175)[0x3d9fe34065]
/lib64/libc.so.6[0x3d9fe6f977]
/lib64/libc.so.6[0x3d9fe75296]
/usr/lib64/libglusterfs.so.0(_gf_log+0x3c1)[0x3d7b419581]
/usr/lib64/glusterfs/3.3.0rhs/xlator/features/marker.so(mq_inspect_directory_xattr+0x312)[0x7f3ae27b76


###############################################################
Backtrace from the core 
###############################################################

#0  0x0000003d9fe32885 in raise () from /lib64/libc.so.6
#1  0x0000003d9fe34065 in abort () from /lib64/libc.so.6
#2  0x0000003d9fe6f977 in __libc_message () from /lib64/libc.so.6
#3  0x0000003d9fe75296 in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003d7b419581 in _gf_log (domain=<value optimized out>, 
    file=0x7f3ae27b9af1 "marker-quota.c", function=<value optimized out>, line=2047, 
    level=GF_LOG_WARNING, fmt=0x7f3ae27ba3a8 "cannot add a new contribution node") at logging.c:597
#5  0x00007f3ae27b7632 in mq_inspect_directory_xattr (this=0x25ef5e0, loc=0x25f6330, 
    dict=0x7f3ae62397cc, buf=...) at marker-quota.c:2046
#6  0x00007f3ae27b78c5 in mq_xattr_state (this=<value optimized out>, loc=<value optimized out>, 
    dict=<value optimized out>, buf=...) at marker-quota.c:2184
#7  0x00007f3ae27a9881 in marker_lookup_cbk (frame=0x7f3ae66081c0, cookie=<value optimized out>, 
    this=0x25ef5e0, op_ret=0, op_errno=22, inode=0x7f3ae0e79bac, buf=0x7f3ae1c2bd70, 
    dict=0x7f3ae62397cc, postparent=0x7f3ae1c2bd00) at marker.c:2224
#8  0x00007f3ae29c519c in index_lookup_wrapper (frame=0x7f3ae65d43fc, this=<value optimized out>, 
    loc=0x7f3ae6290530, xattr_req=<value optimized out>) at index.c:817
#9  0x0000003d7b42eebf in call_resume_wind (stub=0x7f3ae62904f0) at call-stub.c:2689
#10 call_resume (stub=0x7f3ae62904f0) at call-stub.c:4151
#11 0x00007f3ae29c39e2 in index_worker (data=<value optimized out>) at index.c:89
#12 0x0000003da06077f1 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003d9fee5ccd in clone () from /lib64/libc.so.6
(gdb) f 3 
#3  0x0000003d9fe75296 in malloc_printerr () from /lib64/libc.so.6
(gdb) f 4 
#4  0x0000003d7b419581 in _gf_log (domain=<value optimized out>, 
    file=0x7f3ae27b9af1 "marker-quota.c", function=<value optimized out>, line=2047, 
    level=GF_LOG_WARNING, fmt=0x7f3ae27ba3a8 "cannot add a new contribution node") at logging.c:597
597	                GF_FREE (msg);
(gdb) f 5 
#5  0x00007f3ae27b7632 in mq_inspect_directory_xattr (this=0x25ef5e0, loc=0x25f6330, 
    dict=0x7f3ae62397cc, buf=...) at marker-quota.c:2046
2046	                        gf_log (this->name, GF_LOG_WARNING,

Comment 2 Raghavendra Bhat 2012-12-04 09:45:57 UTC

With master branch not seen this happening anymore, running the similar type of tests in longevity test-bed for more than 2weeks and this issue is not seen. Marking it as WORKSFORME (with Fixed in version as 3.4.0qa4), please feel free to reopen if seen again.