Bug 761916 (GLUSTER-184) - [ glusterfs 2.0.6rc2 ] - Client Segfault while running fs-perf-test
Summary: [ glusterfs 2.0.6rc2 ] - Client Segfault while running fs-perf-test
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-184
Product: GlusterFS
Classification: Community
Component: replicate
Version: 2.0.5
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-03 12:01 UTC by Gururaj K
Modified: 2009-09-04 06:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Gururaj K 2009-08-03 12:01:31 UTC
* 4 server Distribute over Replicate setup
* Write behind on client side
* IOT on server side

The client segfaulted after I started fs-perf-test. It was running fine under stress for more than 3 days.

I also noticed that the server had segfaulted a few minutes before the client (possibly causing buf to become null, which does not seem to have been handled in replicate).


# /opt/glusterfs/2.0.6rc2/sbin/glusterfs -V
glusterfs 2.0.6rc2 built on Jul 30 2009 22:50:35
Repository revision: v2.0.5-13-g5e3ca25


# /share/guru/analyse_core.sh /core.8991
..
Core was generated by `/opt/glusterfs/2.0.6rc2/sbin/glusterfs -f /share/pavan/volfiles/afr_dht/client.'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002b02825ecf19 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=77, vector=0x0, count=0, buf=0x0, iobref=0x2b028236d000)
    at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:736
736                     buf->st_ino = local->cont.readv.ino;
(gdb) bt
#0  0x00002b02825ecf19 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=77, vector=0x0, count=0, buf=0x0, iobref=0x2b028236d000)
    at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:736
#1  0x00002b02823bdb6f in client_readv (frame=0x2aaaad6f5610, this=0x19518e50, fd=0x2aaab94a3830, size=8192, offset=0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:1625
#2  0x00002b02825ecf00 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=107, vector=0x7fff29205fb0, count=1, buf=0x7fff29205f20, 
    iobref=0x0) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:726
#3  0x00002b02823c64d7 in client_readv_cbk (frame=0x2aaaad542320, hdr=0x7fff29206050, hdrlen=32, iobuf=0x0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:4324
#4  0x00002b02823cbca0 in saved_frames_unwind (this=0x19519a20, saved_frames=0x1951fbb0, head=0x1951fbb8, gf_ops=0x2b02825d1a80, gf_op_list=0x2b0281af7c80)
    at ../../../../../xlators/protocol/client/src/saved-frames.c:174
#5  0x00002b02823cbd29 in saved_frames_destroy (this=0x19519a20, frames=0x1951fbb0, gf_fops=0x2b02825d1a80, gf_mops=0x2b02825d1be0, gf_cbks=0x2b02825d1c10)
    at ../../../../../xlators/protocol/client/src/saved-frames.c:186
#6  0x00002b02823c9dcd in protocol_client_cleanup (trans=0x1951f6e0) at ../../../../../xlators/protocol/client/src/client-protocol.c:5713
#7  0x00002b02823caf56 in notify (this=0x19519a20, event=4, data=0x1951f6e0) at ../../../../../xlators/protocol/client/src/client-protocol.c:6221
#8  0x00002aaaaaaae1d8 in socket_event_poll_err (this=0x1951f6e0) at ../../../../transport/socket/src/socket.c:420
#9  0x00002aaaaaaaeeba in socket_event_handler (fd=11, idx=2, data=0x1951f6e0, poll_in=1, poll_out=0, poll_err=0) at ../../../../transport/socket/src/socket.c:818
#10 0x00002b02818d70dd in event_dispatch_epoll_handler (event_pool=0x19514680, events=0x19523ed0, i=0) at ../../../libglusterfs/src/event.c:804
#11 0x00002b02818d72b2 in event_dispatch_epoll (event_pool=0x19514680) at ../../../libglusterfs/src/event.c:867
#12 0x00002b02818d75c8 in event_dispatch (event_pool=0x19514680) at ../../../libglusterfs/src/event.c:975
#13 0x00000000004053a9 in main (argc=6, argv=0x7fff29206ad8) at ../../../glusterfsd/src/glusterfsd.c:1226
(gdb) p buf
$1 = (struct stat *) 0x0
(gdb) l
731                                        local->cont.readv.offset);
732             }
733
734     out:
735             if (unwind) {
736                     buf->st_ino = local->cont.readv.ino;
737
738                     AFR_STACK_UNWIND (frame, op_ret, op_errno, vector, count, buf,
739                                       iobref);
740             }

Comment 1 Anand Avati 2009-08-03 18:02:02 UTC
> (gdb) l
> 731                                        local->cont.readv.offset);
> 732             }
> 733
> 734     out:
> 735             if (unwind) {
> 736                     buf->st_ino = local->cont.readv.ino;
> 737
> 738                     AFR_STACK_UNWIND (frame, op_ret, op_errno, vector,
> count, buf,
> 739                                       iobref);
> 740             }

This is a regression caused by a recent commit -

http://git.gluster.com/?p=glusterfs.git;a=commitdiff;h=ccd93eb64c0f2f73f83e025d3efae794803aaa4c

The bug is actually serious and can happen even without a 3-day stress, if the server disconnects in the middle of a read call. Please fix this on priority.

Avati

Comment 2 Anand Avati 2009-08-04 16:47:44 UTC
PATCH: http://patches.gluster.com/patch/851 in master (cluster/afr: inode-read: Check stat buf for NULL before attempting to set inode number.)

Comment 3 Anand Avati 2009-08-04 16:47:48 UTC
PATCH: http://patches.gluster.com/patch/852 in release-2.0 (cluster/afr: inode-read: Check stat buf for NULL before attempting to set inode number.)


Note You need to log in before you can comment on or make changes to this bug.