Hide Forgot
In a 12-hour test case, glusterfs process crashed with a 800M+ core file on a system that had just 1G mem. The fop being processed failed in qr_readv because of ENOMEM. The stack trace was: Program terminated with signal 11, Segmentation fault. #0 nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0) at nfs-fops.c:1273 1273 nfs-fops.c: No such file or directory. in nfs-fops.c (gdb) bt #0 nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0) at nfs-fops.c:1273 #1 0x00007f7cc995c77f in io_stats_readv_cbk (frame=0x7f7ccb4f9570, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, vector=0x0, count=-1, buf=0x7f7cc9473c70, iobref=0x0) at io-stats.c:516 #2 0x00007f7cc9b6a028 in qr_readv (frame=<value optimized out>, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>, offset=<value optimized out>) at quick-read.c:1064 #3 0x00007f7cc9956eb4 in io_stats_readv (frame=<value optimized out>, this=0x1acd140, fd=0x7f7cc666b6b4, size=65536, offset=262144) at io-stats.c:1200 #4 0x00007f7cc94e6d04 in nfs_fop_read (nfsx=<value optimized out>, xl=0x1acd140, nfu=<value optimized out>, fd=0x7f7cc666b6b4, size=65536, offset=0, cbk=0x7f7cc94f83e0 <nfs3svc_read_cbk>, local=0x7f7cc3842c5c) at nfs-fops.c:1298 #5 0x00007f7cc94f8337 in nfs3_read_fd_resume (carg=0x7f7cc3842c5c) at nfs3.c:1669 #6 0x00007f7cc9504184 in nfs3_file_open_and_resume (cs=0x7f7cc3842c5c, resume=<value optimized out>) at nfs3-helpers.c:2223 #7 0x00007f7cc94f8218 in nfs3_read_resume (carg=0x7f7cc9473c70) at nfs3.c:1697 #8 0x00007f7cc950273f in nfs3_fh_resolve_inode_done (cs=0x7f7cc3842c5c, inode=<value optimized out>) at nfs3-helpers.c:2508 #9 0x00007f7cc95027c3 in nfs3_fh_resolve_inode (cs=0x7f7cc3842c5c) at nfs3-helpers.c:3064 #10 0x00007f7cc94fb13c in nfs3_read (req=0x7f7cbc010b68, fh=0x7f7cc9473f90, offset=262144, count=65536) at nfs3.c:1736 #11 0x00007f7cc94fb416 in nfs3svc_read (req=0x7f7cbc010b68) at nfs3.c:1770 #12 0x00007f7cc950cb0c in nfs_rpcsvc_handle_rpc_call (conn=0x7f7c90b1b850) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1984 #13 0x00007f7cc950d2fd in nfs_rpcsvc_record_update_state (conn=0x7f7c90b1b850, dataread=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2469 #14 0x00007f7cc950d648 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=28102976, data=0x7f7c90b1b850, poll_in=-1, poll_out=22, poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2641 #15 0x00007f7ccc3c8b92 in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:812 #16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:876 #17 0x00007f7cc950eb72 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64 #18 0x0000003871c0685a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #19 0x00000038710de22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #20 0x0000000000000000 in ?? () quick-read returns an error because memory allocation failed in its readv. It returns a op_ret -1 with other arguments pointing to undefined addresses. In this case, NFS touches @stbuf even when op_ret is -1. nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0) at nfs-fops.c:1273
PATCH: http://patches.gluster.com/patch/5915 in master (nfs: Do not touch iatt on failed fops)