Bug 763901 (GLUSTER-2169)

Summary: NFS crash in nfs-fops due to failed fop from subvolume
Product: [Community] GlusterFS Reporter: Shehjar Tikoo <shehjart>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: anush, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: nfs
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shehjar Tikoo 2010-12-01 06:07:28 UTC
In a 12-hour test case, glusterfs process crashed with a 800M+ core file on a system that had just 1G mem. The fop being processed failed in qr_readv because of ENOMEM. The stack trace was:

Program terminated with signal 11, Segmentation fault.
#0  nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0)
    at nfs-fops.c:1273
1273    nfs-fops.c: No such file or directory.
        in nfs-fops.c
(gdb) bt
#0  nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0)
    at nfs-fops.c:1273
#1  0x00007f7cc995c77f in io_stats_readv_cbk (frame=0x7f7ccb4f9570, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, op_errno=<value optimized out>,
    vector=0x0, count=-1, buf=0x7f7cc9473c70, iobref=0x0) at io-stats.c:516
#2  0x00007f7cc9b6a028 in qr_readv (frame=<value optimized out>, this=<value optimized out>, fd=<value optimized out>, size=<value optimized out>,
    offset=<value optimized out>) at quick-read.c:1064
#3  0x00007f7cc9956eb4 in io_stats_readv (frame=<value optimized out>, this=0x1acd140, fd=0x7f7cc666b6b4, size=65536, offset=262144) at io-stats.c:1200
#4  0x00007f7cc94e6d04 in nfs_fop_read (nfsx=<value optimized out>, xl=0x1acd140, nfu=<value optimized out>, fd=0x7f7cc666b6b4, size=65536, offset=0,
    cbk=0x7f7cc94f83e0 <nfs3svc_read_cbk>, local=0x7f7cc3842c5c) at nfs-fops.c:1298
#5  0x00007f7cc94f8337 in nfs3_read_fd_resume (carg=0x7f7cc3842c5c) at nfs3.c:1669
#6  0x00007f7cc9504184 in nfs3_file_open_and_resume (cs=0x7f7cc3842c5c, resume=<value optimized out>) at nfs3-helpers.c:2223
#7  0x00007f7cc94f8218 in nfs3_read_resume (carg=0x7f7cc9473c70) at nfs3.c:1697
#8  0x00007f7cc950273f in nfs3_fh_resolve_inode_done (cs=0x7f7cc3842c5c, inode=<value optimized out>) at nfs3-helpers.c:2508
#9  0x00007f7cc95027c3 in nfs3_fh_resolve_inode (cs=0x7f7cc3842c5c) at nfs3-helpers.c:3064
#10 0x00007f7cc94fb13c in nfs3_read (req=0x7f7cbc010b68, fh=0x7f7cc9473f90, offset=262144, count=65536) at nfs3.c:1736
#11 0x00007f7cc94fb416 in nfs3svc_read (req=0x7f7cbc010b68) at nfs3.c:1770
#12 0x00007f7cc950cb0c in nfs_rpcsvc_handle_rpc_call (conn=0x7f7c90b1b850) at ../../../../xlators/nfs/lib/src/rpcsvc.c:1984
#13 0x00007f7cc950d2fd in nfs_rpcsvc_record_update_state (conn=0x7f7c90b1b850, dataread=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2469
#14 0x00007f7cc950d648 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=28102976, data=0x7f7c90b1b850, poll_in=-1, poll_out=22, poll_err=0)
    at ../../../../xlators/nfs/lib/src/rpcsvc.c:2641
#15 0x00007f7ccc3c8b92 in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:812
#16 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:876
#17 0x00007f7cc950eb72 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64
#18 0x0000003871c0685a in start_thread (arg=<value optimized out>) at pthread_create.c:297
#19 0x00000038710de22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#20 0x0000000000000000 in ?? ()


quick-read returns an error because memory allocation failed in its readv. It returns a op_ret -1 with other arguments pointing to undefined addresses. In this case, NFS touches @stbuf even when op_ret is -1.

nfs_fop_readv_cbk (frame=0x7f7ccb2dc904, cookie=0x1acd140, this=0x1ad1b40, op_ret=-1, op_errno=22, vector=0x0, count=-1, stbuf=0x7f7cc9473c70, iobref=0x0)
    at nfs-fops.c:1273

Comment 1 Anand Avati 2010-12-28 01:51:03 UTC
PATCH: http://patches.gluster.com/patch/5915 in master (nfs: Do not touch iatt on failed fops)