Hide Forgot
* 4 server Distribute over Replicate setup * Write behind on client side * IOT on server side The client segfaulted after I started fs-perf-test. It was running fine under stress for more than 3 days. I also noticed that the server had segfaulted a few minutes before the client (possibly causing buf to become null, which does not seem to have been handled in replicate). # /opt/glusterfs/2.0.6rc2/sbin/glusterfs -V glusterfs 2.0.6rc2 built on Jul 30 2009 22:50:35 Repository revision: v2.0.5-13-g5e3ca25 # /share/guru/analyse_core.sh /core.8991 .. Core was generated by `/opt/glusterfs/2.0.6rc2/sbin/glusterfs -f /share/pavan/volfiles/afr_dht/client.'. Program terminated with signal 11, Segmentation fault. #0 0x00002b02825ecf19 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=77, vector=0x0, count=0, buf=0x0, iobref=0x2b028236d000) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:736 736 buf->st_ino = local->cont.readv.ino; (gdb) bt #0 0x00002b02825ecf19 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=77, vector=0x0, count=0, buf=0x0, iobref=0x2b028236d000) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:736 #1 0x00002b02823bdb6f in client_readv (frame=0x2aaaad6f5610, this=0x19518e50, fd=0x2aaab94a3830, size=8192, offset=0) at ../../../../../xlators/protocol/client/src/client-protocol.c:1625 #2 0x00002b02825ecf00 in afr_readv_cbk (frame=0x2aaaac030300, cookie=0x1, this=0x1951a800, op_ret=-1, op_errno=107, vector=0x7fff29205fb0, count=1, buf=0x7fff29205f20, iobref=0x0) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:726 #3 0x00002b02823c64d7 in client_readv_cbk (frame=0x2aaaad542320, hdr=0x7fff29206050, hdrlen=32, iobuf=0x0) at ../../../../../xlators/protocol/client/src/client-protocol.c:4324 #4 0x00002b02823cbca0 in saved_frames_unwind (this=0x19519a20, saved_frames=0x1951fbb0, head=0x1951fbb8, gf_ops=0x2b02825d1a80, gf_op_list=0x2b0281af7c80) at ../../../../../xlators/protocol/client/src/saved-frames.c:174 #5 0x00002b02823cbd29 in saved_frames_destroy (this=0x19519a20, frames=0x1951fbb0, gf_fops=0x2b02825d1a80, gf_mops=0x2b02825d1be0, gf_cbks=0x2b02825d1c10) at ../../../../../xlators/protocol/client/src/saved-frames.c:186 #6 0x00002b02823c9dcd in protocol_client_cleanup (trans=0x1951f6e0) at ../../../../../xlators/protocol/client/src/client-protocol.c:5713 #7 0x00002b02823caf56 in notify (this=0x19519a20, event=4, data=0x1951f6e0) at ../../../../../xlators/protocol/client/src/client-protocol.c:6221 #8 0x00002aaaaaaae1d8 in socket_event_poll_err (this=0x1951f6e0) at ../../../../transport/socket/src/socket.c:420 #9 0x00002aaaaaaaeeba in socket_event_handler (fd=11, idx=2, data=0x1951f6e0, poll_in=1, poll_out=0, poll_err=0) at ../../../../transport/socket/src/socket.c:818 #10 0x00002b02818d70dd in event_dispatch_epoll_handler (event_pool=0x19514680, events=0x19523ed0, i=0) at ../../../libglusterfs/src/event.c:804 #11 0x00002b02818d72b2 in event_dispatch_epoll (event_pool=0x19514680) at ../../../libglusterfs/src/event.c:867 #12 0x00002b02818d75c8 in event_dispatch (event_pool=0x19514680) at ../../../libglusterfs/src/event.c:975 #13 0x00000000004053a9 in main (argc=6, argv=0x7fff29206ad8) at ../../../glusterfsd/src/glusterfsd.c:1226 (gdb) p buf $1 = (struct stat *) 0x0 (gdb) l 731 local->cont.readv.offset); 732 } 733 734 out: 735 if (unwind) { 736 buf->st_ino = local->cont.readv.ino; 737 738 AFR_STACK_UNWIND (frame, op_ret, op_errno, vector, count, buf, 739 iobref); 740 }
> (gdb) l > 731 local->cont.readv.offset); > 732 } > 733 > 734 out: > 735 if (unwind) { > 736 buf->st_ino = local->cont.readv.ino; > 737 > 738 AFR_STACK_UNWIND (frame, op_ret, op_errno, vector, > count, buf, > 739 iobref); > 740 } This is a regression caused by a recent commit - http://git.gluster.com/?p=glusterfs.git;a=commitdiff;h=ccd93eb64c0f2f73f83e025d3efae794803aaa4c The bug is actually serious and can happen even without a 3-day stress, if the server disconnects in the middle of a read call. Please fix this on priority. Avati
PATCH: http://patches.gluster.com/patch/851 in master (cluster/afr: inode-read: Check stat buf for NULL before attempting to set inode number.)
PATCH: http://patches.gluster.com/patch/852 in release-2.0 (cluster/afr: inode-read: Check stat buf for NULL before attempting to set inode number.)