Bug 763871 (GLUSTER-2139) - [3.1.1qa9] After failover NFS writes to certain files but hangs for few.
Summary: [3.1.1qa9] After failover NFS writes to certain files but hangs for few.
Keywords:
Status: CLOSED WONTFIX
Alias: GLUSTER-2139
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.1.1
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-22 22:09 UTC by Harshavardhana
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: DNR
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Harshavardhana 2010-11-22 22:09:36 UTC
sr/libexec/getconf. holes=1 overlaps=0
[2010-11-23 04:09:34.278595] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.278639] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279233] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279259] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed
[2010-11-23 04:09:34.279604] E [rpcsvc.c:1693:nfs_rpcsvc_submit_generic] nfsrpc: Failed to submit message
[2010-11-23 04:09:34.279629] E [nfs3.c:473:nfs3svc_submit_reply] nfs-nfsv3: Reply submission failed


kernel backtrace for the hung task is returned on the client side as this 


Call Trace:
 [<ffffffffa02a1424>] ? __rpc_execute+0xc8/0x232 [sunrpc]
 [<ffffffff81016939>] ? read_tsc+0x9/0x1b
 [<ffffffff81063e2b>] ? clocksource_read+0xf/0x11
 [<ffffffff810acaa6>] ? sync_page+0x0/0x4a
 [<ffffffff813d994c>] schedule+0xe/0x22
 [<ffffffff813d9991>] io_schedule+0x31/0x42
 [<ffffffff810acaec>] sync_page+0x46/0x4a
 [<ffffffff813d9ea8>] __wait_on_bit+0x48/0x7b
 [<ffffffff810accc0>] wait_on_page_bit+0x72/0x79
 [<ffffffff8105d730>] ? wake_bit_function+0x0/0x33
 [<ffffffff810b4f35>] ? pagevec_lookup_tag+0x25/0x2e
 [<ffffffff810ad561>] wait_on_page_writeback_range+0x9c/0x15d
 [<ffffffff810ad647>] filemap_fdatawait+0x25/0x27
 [<ffffffff810adea9>] filemap_write_and_wait+0x2c/0x38
 [<ffffffffa0313cc7>] nfs_setattr+0xbd/0x125 [nfs]
 [<ffffffff810ae0e6>] ? generic_file_aio_write+0x75/0xca
 [<ffffffffa0311c2a>] ? nfs_file_write+0x104/0x180 [nfs]
 [<ffffffff8104db5b>] ? current_fs_time+0x27/0x2e
 [<ffffffff810f612c>] notify_change+0x1a1/0x305
 [<ffffffff81100863>] utimes_common+0x12b/0x152
 [<ffffffff8110090e>] do_utimes+0x84/0xd7
 [<ffffffff810eb48b>] ? path_put+0x22/0x26
 [<ffffffff81100a7e>] sys_utimensat+0x63/0x6c
 [<ffffffff81010c82>] system_call_fastpath+0x16/0x1b


hang on the terminal


/usr/lib64/libgstvideo-0.10.so.0.18.0' -> `./usr/lib64/libgstvideo-0.10.so.0.18.0'
`/usr/lib64/libkio.so.5.3.0' -> `./usr/lib64/libkio.so.5.3.0'
removed `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libpoppler.so.4' -> `./usr/lib64/libpoppler.so.4'
`/usr/lib64/libxcb-xvmc.so.0.0.0' -> `./usr/lib64/libxcb-xvmc.so.0.0.0'
`/usr/lib64/libkdeinit4_klauncher.so' -> `./usr/lib64/libkdeinit4_klauncher.so'
`/usr/lib64/libgstinterfaces-0.10.so.0' -> `./usr/lib64/libgstinterfaces-0.10.so.0'
`/usr/lib64/libffi.so.5.0.6' -> `./usr/lib64/libffi.so.5.0.6'
removed `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXxf86dga.so.1' -> `./usr/lib64/libXxf86dga.so.1'
`/usr/lib64/libXcursor.so.1.0.2' -> `./usr/lib64/libXcursor.so.1.0.2'
`/usr/lib64/libpcrecpp.so.0.0.0' -> `./usr/lib64/libpcrecpp.so.0.0.0'
removed `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libsnmp.so.15' -> `./usr/lib64/libsnmp.so.15'
`/usr/lib64/libQtCLucene.so.4.5.3' -> `./usr/lib64/libQtCLucene.so.4.5.3'


^C^C

^C^C^C^C^C^C
^C^Z

^C^C^Z

^C^C^C^Z^Z

Comment 1 Shehjar Tikoo 2010-11-26 05:52:02 UTC
It looks like the reply failure happens because the connection was disconnected by the nfs client. Normally, the nfs client would reconnect and retransmit the request but that does not the case here.

Can you get me the trace log for nfs? I can pin-point whether it is really due to a disconnection or if there is a bug in the connection handling code. Thanks

Comment 2 Shehjar Tikoo 2010-12-21 02:46:28 UTC
Harsha confirms that this bug is seen very rarely so no info is available for debugging it. He agrees with closing the bug.


Note You need to log in before you can comment on or make changes to this bug.