Bug 763871 - (GLUSTER-2139) [3.1.1qa9] After failover NFS writes to certain files but hangs for few.
[3.1.1qa9] After failover NFS writes to certain files but hangs for few.
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
All Linux
urgent Severity high
: ---
: ---
Assigned To: Shehjar Tikoo
Depends On:
  Show dependency treegraph
Reported: 2010-11-22 17:09 EST by Harshavardhana
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: DNR
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Harshavardhana 2010-11-22 17:09:36 EST
sr/libexec/getconf. holes=1 overlaps=0
[2010-11-23 04:09:34.278595] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.278639] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279233] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279259] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279604] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279629] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed

kernel backtrace for the hung task is returned on the client side as this 

Call Trace:
 [<ffffffffa02a1424>] ? __rpc_execute+0xc8/0x232 [sunrpc]
 [<ffffffff81016939>] ? read_tsc+0x9/0x1b
 [<ffffffff81063e2b>] ? clocksource_read+0xf/0x11
 [<ffffffff810acaa6>] ? sync_page+0x0/0x4a
 [<ffffffff813d994c>] schedule+0xe/0x22
 [<ffffffff813d9991>] io_schedule+0x31/0x42
 [<ffffffff810acaec>] sync_page+0x46/0x4a
 [<ffffffff813d9ea8>] __wait_on_bit+0x48/0x7b
 [<ffffffff810accc0>] wait_on_page_bit+0x72/0x79
 [<ffffffff8105d730>] ? wake_bit_function+0x0/0x33
 [<ffffffff810b4f35>] ? pagevec_lookup_tag+0x25/0x2e
 [<ffffffff810ad561>] wait_on_page_writeback_range+0x9c/0x15d
 [<ffffffff810ad647>] filemap_fdatawait+0x25/0x27
 [<ffffffff810adea9>] filemap_write_and_wait+0x2c/0x38
 [<ffffffffa0313cc7>] nfs_setattr+0xbd/0x125 [nfs]
 [<ffffffff810ae0e6>] ? generic_file_aio_write+0x75/0xca
 [<ffffffffa0311c2a>] ? nfs_file_write+0x104/0x180 [nfs]
 [<ffffffff8104db5b>] ? current_fs_time+0x27/0x2e
 [<ffffffff810f612c>] notify_change+0x1a1/0x305
 [<ffffffff81100863>] utimes_common+0x12b/0x152
 [<ffffffff8110090e>] do_utimes+0x84/0xd7
 [<ffffffff810eb48b>] ? path_put+0x22/0x26
 [<ffffffff81100a7e>] sys_utimensat+0x63/0x6c
 [<ffffffff81010c82>] system_call_fastpath+0x16/0x1b

hang on the terminal

/usr/lib64/libgstvideo-0.10.so.0.18.0' -> `./usr/lib64/libgstvideo-0.10.so.0.18.0'
`/usr/lib64/libkio.so.5.3.0' -> `./usr/lib64/libkio.so.5.3.0'
removed `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libpoppler.so.4' -> `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libxcb-xvmc.so.0.0.0' -> `./usr/lib64/libxcb-xvmc.so.0.0.0'
`/usr/lib64/libkdeinit4_klauncher.so' -> `./usr/lib64/libkdeinit4_klauncher.so'
`/usr/lib64/libgstinterfaces-0.10.so.0' -> `./usr/lib64/libgstinterfaces-0.10.so.0'
`/usr/lib64/libffi.so.5.0.6' -> `./usr/lib64/libffi.so.5.0.6'
removed `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXxf86dga.so.1' -> `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXcursor.so.1.0.2' -> `./usr/lib64/libXcursor.so.1.0.2'
`/usr/lib64/libpcrecpp.so.0.0.0' -> `./usr/lib64/libpcrecpp.so.0.0.0'
removed `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libsnmp.so.15' -> `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libQtCLucene.so.4.5.3' -> `./usr/lib64/libQtCLucene.so.4.5.3'




Comment 1 Shehjar Tikoo 2010-11-26 00:52:02 EST
It looks like the reply failure happens because the connection was disconnected by the nfs client. Normally, the nfs client would reconnect and retransmit the request but that does not the case here.

Can you get me the trace log for nfs? I can pin-point whether it is really due to a disconnection or if there is a bug in the connection handling code. Thanks
Comment 2 Shehjar Tikoo 2010-12-20 21:46:28 EST
Harsha confirms that this bug is seen very rarely so no info is available for debugging it. He agrees with closing the bug.

Note You need to log in before you can comment on or make changes to this bug.