Bug 763871 (GLUSTER-2139)

Summary: [3.1.1qa9] After failover NFS writes to certain files but hangs for few.
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.1CC: cww, gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Harshavardhana 2010-11-22 17:09:36 EST
sr/libexec/getconf. holes=1 overlaps=0
[2010-11-23 04:09:34.278595] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.278639] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279233] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279259] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279604] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279629] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed


kernel backtrace for the hung task is returned on the client side as this 


Call Trace:
 [<ffffffffa02a1424>] ? __rpc_execute+0xc8/0x232 [sunrpc]
 [<ffffffff81016939>] ? read_tsc+0x9/0x1b
 [<ffffffff81063e2b>] ? clocksource_read+0xf/0x11
 [<ffffffff810acaa6>] ? sync_page+0x0/0x4a
 [<ffffffff813d994c>] schedule+0xe/0x22
 [<ffffffff813d9991>] io_schedule+0x31/0x42
 [<ffffffff810acaec>] sync_page+0x46/0x4a
 [<ffffffff813d9ea8>] __wait_on_bit+0x48/0x7b
 [<ffffffff810accc0>] wait_on_page_bit+0x72/0x79
 [<ffffffff8105d730>] ? wake_bit_function+0x0/0x33
 [<ffffffff810b4f35>] ? pagevec_lookup_tag+0x25/0x2e
 [<ffffffff810ad561>] wait_on_page_writeback_range+0x9c/0x15d
 [<ffffffff810ad647>] filemap_fdatawait+0x25/0x27
 [<ffffffff810adea9>] filemap_write_and_wait+0x2c/0x38
 [<ffffffffa0313cc7>] nfs_setattr+0xbd/0x125 [nfs]
 [<ffffffff810ae0e6>] ? generic_file_aio_write+0x75/0xca
 [<ffffffffa0311c2a>] ? nfs_file_write+0x104/0x180 [nfs]
 [<ffffffff8104db5b>] ? current_fs_time+0x27/0x2e
 [<ffffffff810f612c>] notify_change+0x1a1/0x305
 [<ffffffff81100863>] utimes_common+0x12b/0x152
 [<ffffffff8110090e>] do_utimes+0x84/0xd7
 [<ffffffff810eb48b>] ? path_put+0x22/0x26
 [<ffffffff81100a7e>] sys_utimensat+0x63/0x6c
 [<ffffffff81010c82>] system_call_fastpath+0x16/0x1b


hang on the terminal


/usr/lib64/libgstvideo-0.10.so.0.18.0' -> `./usr/lib64/libgstvideo-0.10.so.0.18.0'
`/usr/lib64/libkio.so.5.3.0' -> `./usr/lib64/libkio.so.5.3.0'
removed `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libpoppler.so.4' -> `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libxcb-xvmc.so.0.0.0' -> `./usr/lib64/libxcb-xvmc.so.0.0.0'
`/usr/lib64/libkdeinit4_klauncher.so' -> `./usr/lib64/libkdeinit4_klauncher.so'
`/usr/lib64/libgstinterfaces-0.10.so.0' -> `./usr/lib64/libgstinterfaces-0.10.so.0'
`/usr/lib64/libffi.so.5.0.6' -> `./usr/lib64/libffi.so.5.0.6'
removed `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXxf86dga.so.1' -> `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXcursor.so.1.0.2' -> `./usr/lib64/libXcursor.so.1.0.2'
`/usr/lib64/libpcrecpp.so.0.0.0' -> `./usr/lib64/libpcrecpp.so.0.0.0'
removed `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libsnmp.so.15' -> `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libQtCLucene.so.4.5.3' -> `./usr/lib64/libQtCLucene.so.4.5.3'


^C^C

^C^C^C^C^C^C
^C^Z

^C^C^Z

^C^C^C^Z^Z
Comment 1 Shehjar Tikoo 2010-11-26 00:52:02 EST
It looks like the reply failure happens because the connection was disconnected by the nfs client. Normally, the nfs client would reconnect and retransmit the request but that does not the case here.

Can you get me the trace log for nfs? I can pin-point whether it is really due to a disconnection or if there is a bug in the connection handling code. Thanks
Comment 2 Shehjar Tikoo 2010-12-20 21:46:28 EST
Harsha confirms that this bug is seen very rarely so no info is available for debugging it. He agrees with closing the bug.