Bug 856934

Summary: Glusterfsd process crashed due to memory corruption on unlink.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: glusterdAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED WORKSFORME QA Contact: Vijaykumar Koppad <vkoppad>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: amarts, bbandari, rhs-bugs, shaines, social, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 09:48:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2012-09-13 07:41:10 UTC
Description of problem: The Glusterfsd process was crashed while doing heavy operations. 


Version-Release number of selected component (if applicable):RHS-2.0.z


How reproducible: It might hard to reproduce.


Steps to Reproduce:
1.Create distribut-replicate volume.
2.Run creating and deletion of file for some 2 weeks.
3.Run kernel untar on some clients parallely, 
   Eg: while : ; do mkdir client1 ; tar -xvf linux-2.6.39.tar.gz -C client1 ; rm -rf client1; done 
 Like this run on multiple fuse clients parallely.
4.On one of the other client run fs_mark to create file with -F option in loop. 
   Eg: while :; do rm -rvf $MNT_PT/*; ./fs_mark -s 100000 -D 1000 -N 1000 -L 10        -n 10 -t 50 -F -d $MNT_PT; rm -rvf $MNT_PT/*; done
5. This is most probable steps.

Actual results:The glusterfsd process crashed.


Expected results: No process should crash. 


Additional info:

###############################################################
Back trace in the log file :
###############################################################

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-09-07 07:25:09
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0rhs
/lib64/libc.so.6[0x3d9fe32900]
/lib64/libc.so.6(cfree+0x1c)[0x3d9fe7a5ac]
/lib64/libc.so.6(_IO_free_backup_area+0x19)[0x3d9fe73979]
/lib64/libc.so.6(_IO_file_overflow+0x20a)[0x3d9fe718da]
/lib64/libc.so.6(_IO_file_xsputn+0x6d)[0x3d9fe70b8d]
/lib64/libc.so.6(_IO_vfprintf+0x418e)[0x3d9fe47e7e]
/lib64/libc.so.6(__fprintf_chk+0xeb)[0x3d9fefd1ab]
/usr/lib64/libglusterfs.so.0(_gf_log+0x393)[0x3d7b419553]
/usr/lib64/glusterfs/3.3.0rhs/xlator/protocol/server.so(server_unlink_cbk+0x22b)[0x7f5e87bd5c2b]
/usr/lib64/glusterfs/3.3.0rhs/xlator/debug/io-stats.so(io_stats_unlink_cbk+0xec)[0x7f5e87df744c]
/usr/lib64/glusterfs/3.3.0rhs/xlator/features/marker.so(marker_unlink_stat_cbk+0x200)[0x7f5e8c0846f0]
/usr/lib64/libglusterfs.so.0(default_stat_cbk+0xad)[0x3d7b4265fd]
/usr/lib64/glusterfs/3.3.0rhs/xlator/performance/io-threads.so(iot_stat_cbk+0xad)[0x7f5e8c4a7aed]
/usr/lib64/libglusterfs.so.0(default_stat_cbk+0xad)[0x3d7b4265fd]


###############################################################
Backtrace from the core 
###############################################################


Program terminated with signal 11, Segmentation fault.
#0  0x0000003d9fe7a5ac in free () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt 
#0  0x0000003d9fe7a5ac in free () from /lib64/libc.so.6
#1  0x0000003d9fe73979 in _IO_free_backup_area_internal () from /lib64/libc.so.6
#2  0x0000003d9fe718da in _IO_new_file_overflow () from /lib64/libc.so.6
#3  0x0000003d9fe70b8d in _IO_new_file_xsputn () from /lib64/libc.so.6
#4  0x0000003d9fe47e7e in vfprintf () from /lib64/libc.so.6
#5  0x0000003d9fefd1ab in __fprintf_chk () from /lib64/libc.so.6
#6  0x0000003d7b419553 in fprintf (domain=<value optimized out>, 
    file=0x7f5e87bdebff "server3_1-fops.c", function=<value optimized out>, line=1085, 
    level=GF_LOG_INFO, fmt=0x7f5e87bdeae0 "%ld: UNLINK %s (%s) ==> %d (%s)")
    at /usr/include/bits/stdio2.h:98
#7  _gf_log (domain=<value optimized out>, file=0x7f5e87bdebff "server3_1-fops.c", 
    function=<value optimized out>, line=1085, level=GF_LOG_INFO, 
    fmt=0x7f5e87bdeae0 "%ld: UNLINK %s (%s) ==> %d (%s)") at logging.c:579
#8  0x00007f5e87bd5c2b in server_unlink_cbk (frame=0x7f5e8fcb5f34, cookie=<value optimized out>, 
    this=0x194cd20, op_ret=-1, op_errno=2, preparent=<value optimized out>, postparent=0x0, 
    xdata=0x0) at server3_1-fops.c:1081
#9  0x00007f5e87df744c in io_stats_unlink_cbk (frame=0x7f5e8fef625c, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=2, preparent=0x0, postparent=0x0, xdata=0x0)
    at io-stats.c:1430
#10 0x00007f5e8c0846f0 in marker_unlink_stat_cbk (frame=0x7f5e8ff290ac, 
    cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, buf=<value optimized out>, xdata=0x0) at marker.c:833
#11 0x0000003d7b4265fd in default_stat_cbk (frame=0x7f5e8feabc78, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, 
    buf=<value optimized out>, xdata=0x0) at defaults.c:47
#12 0x00007f5e8c4a7aed in iot_stat_cbk (frame=0x7f5e8ff33344, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, 
    buf=<value optimized out>, xdata=0x0) at io-threads.c:1170
#13 0x0000003d7b4265fd in default_stat_cbk (frame=0x7f5e8fed437c, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, 
    buf=<value optimized out>, xdata=0x0) at defaults.c:47
#14 0x0000003d7b4265fd in default_stat_cbk (frame=0x7f5e8ff30ba0, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, 
    buf=<value optimized out>, xdata=0x0) at defaults.c:47
#15 0x00007f5e8cae64e0 in posix_stat (frame=0x7f5e8fec7ff8, this=<value optimized out>, 
    loc=<value optimized out>, xdata=<value optimized out>) at posix.c:231
#16 0x0000003d7b41b254 in default_stat (frame=<value optimized out>, this=0x1946680, 
    loc=0x7f5e8fba37c8, xdata=0x0) at defaults.c:1231
#17 0x0000003d7b41b254 in default_stat (frame=<value optimized out>, this=0x1947740, 
    loc=0x7f5e8fba37c8, xdata=0x0) at defaults.c:1231
#18 0x00007f5e8c4ab754 in iot_stat_wrapper (frame=<value optimized out>, this=0x19487e0, 
    loc=0x7f5e8fba37c8, xdata=0x0) at io-threads.c:1178
#19 0x0000003d7b42e5ca in call_resume_wind (stub=0x7f5e8fba3788) at call-stub.c:2394
#20 call_resume (stub=0x7f5e8fba3788) at call-stub.c:4151
#21 0x00007f5e8c4b1f1b in iot_worker (data=0x195a9a0) at io-threads.c:131
#22 0x0000003da06077f1 in start_thread () from /lib64/libpthread.so.0
#23 0x0000003d9fee5ccd in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x0000003d9fe7a5ac in free () from /lib64/libc.so.6
(gdb) f 1
#1  0x0000003d9fe73979 in _IO_free_backup_area_internal () from /lib64/libc.so.6
(gdb) f 8
#8  0x00007f5e87bd5c2b in server_unlink_cbk (frame=0x7f5e8fcb5f34, cookie=<value optimized out>, 
    this=0x194cd20, op_ret=-1, op_errno=2, preparent=<value optimized out>, postparent=0x0, 
    xdata=0x0) at server3_1-fops.c:1081
1081	                gf_log (this->name, GF_LOG_INFO,
(gdb) 
#8  0x00007f5e87bd5c2b in server_unlink_cbk (frame=0x7f5e8fcb5f34, cookie=<value optimized out>, 
    this=0x194cd20, op_ret=-1, op_errno=2, preparent=<value optimized out>, postparent=0x0, 
    xdata=0x0) at server3_1-fops.c:1081
1081	                gf_log (this->name, GF_LOG_INFO,
(gdb) l
1076	
1077	                gf_stat_from_iatt (&rsp.preparent, preparent);
1078	                gf_stat_from_iatt (&rsp.postparent, postparent);
1079	
1080	        } else {
1081	                gf_log (this->name, GF_LOG_INFO,
1082	                        "%"PRId64": UNLINK %s (%s) ==> %"PRId32" (%s)",
1083	                        frame->root->unique, state->loc.path,
1084	                        state->loc.inode ? uuid_utoa (state->loc.inode->gfid) :
1085	                        "--", op_ret, strerror (op_errno));
(gdb) f 9
#9  0x00007f5e87df744c in io_stats_unlink_cbk (frame=0x7f5e8fef625c, cookie=<value optimized out>, 
    this=<value optimized out>, op_ret=-1, op_errno=2, preparent=0x0, postparent=0x0, xdata=0x0)
    at io-stats.c:1430
1430	        STACK_UNWIND_STRICT (unlink, frame, op_ret, op_errno,
(gdb) f 10
#10 0x00007f5e8c0846f0 in marker_unlink_stat_cbk (frame=0x7f5e8ff290ac, 
    cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, buf=<value optimized out>, xdata=0x0) at marker.c:833
833	        STACK_UNWIND_STRICT (unlink, frame, -1, op_errno, NULL, NULL, NULL);

Comment 2 Raghavendra Bhat 2012-12-04 09:48:18 UTC
With master branch not seen this happening anymore, running the similar type of tests in longevity test-bed for more than 2weeks and this issue is not seen. Marking it as WORKSFORME (with Fixed in version as 3.4.0qa4), please feel free to reopen if seen again.

Comment 3 Lukas Bezdicka 2013-06-30 11:21:47 UTC
I wonder if this one is related, it's release-3.3 git build:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2013-06-30 11:41:46
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.1
/lib64/libc.so.6[0x3932a32920]
/lib64/libc.so.6(gsignal+0x35)[0x3932a328a5]
/lib64/libc.so.6(abort+0x175)[0x3932a34085]
/lib64/libc.so.6[0x3932a707b7]
/lib64/libc.so.6[0x3932a760e6]
/lib64/libc.so.6(_IO_free_backup_area+0x19)[0x3932a747c9]
/lib64/libc.so.6(_IO_file_overflow+0x20a)[0x3932a7271a]
/lib64/libc.so.6(_IO_file_xsputn+0x6d)[0x3932a719cd]
/lib64/libc.so.6(_IO_vfprintf+0x41c0)[0x3932a48410]
/lib64/libc.so.6(__fprintf_chk+0xeb)[0x3932affdcb]
/usr/lib64/libglusterfs.so.0(_gf_log+0x393)[0x3350c19553]
/usr/lib64/glusterfs/3.3.1/xlator/storage/posix.so(posix_create+0x7d0)[0x7f7a2eaae650]
/usr/lib64/glusterfs/3.3.1/xlator/features/access-control.so(posix_acl_create+0x291)[0x7f7a2e89acd1]
/usr/lib64/glusterfs/3.3.1/xlator/features/locks.so(pl_create+0x14a)[0x7f7a2e6802da]
/usr/lib64/glusterfs/3.3.1/xlator/performance/io-threads.so(iot_create_wrapper+0x13c)[0x7f7a2e46d16c]
/usr/lib64/libglusterfs.so.0(call_resume+0x84f)[0x3350c2ebef]
/usr/lib64/glusterfs/3.3.1/xlator/performance/io-threads.so(iot_worker+0x15b)[0x7f7a2e472f1b]
/lib64/libpthread.so.0[0x3933207851]
/lib64/libc.so.6(clone+0x6d)[0x3932ae890d]

again the crash is in _gf_log, actualy this one was exactly at the time of log rotation.