Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Solaris client hangs on file read operations|
|Product:||[Community] GlusterFS||Reporter:||Shehjar Tikoo <shehjart>|
|Component:||nfs||Assignee:||Raghavendra G <raghavendra>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Version:||3.1.1||CC:||amarts, gluster-bugs, saurabh, vijay, vs|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Shehjar Tikoo 2010-12-20 06:20:53 EST
Even a cat on a 12 byte file hangs the Solaris client. The problem is somewhere in the translator stack on the read callback path. Some translator is not propagating the op_errno=ENOENT which is set by posix on seeing an EOF. NFS uses op_errno = ENOENT to let clients know that end of file was reached file reading a file. It has worked till now because linux client never reads beyond the file size given in the file's attributes. Solaris client on the other hand, tries to read till the eof flag is set in the read reply. For eg, the first NFS read request looks like: nfs-nfsv3: XID: cea10fa0, READ: args: FH: hashcount 2, exportid c99db2fc-ab91-406d-a3a5-acc7c2b672d8, gfid 8c862c6b-00d7-4753-b76a-21544ed94363, offset: 0, count: 4096 i.e. request to read 4kb starting offset 0, which nfs server replies correctly as: XID: cea10fa0, READ: NFS: 0(Call completed successfully.), POSIX: -1(Unknown error 18446744073709551615), count: 12, is_eof: 0, vector: count: 1, len: 12 But the EOF bit is not set for a file of 12 bytes so Solaris sends another read request: XID: cfa10fa0, READ: args: FH: hashcount 2, exportid c99db2fc-ab91-406d-a3a5-acc7c2b672d8, gfid 8c862c6b-00d7-4753-b76a-21544ed94363, offset: 12, count: 4084 This time starting to read at offset 12, to which nfs server replies. XID: cfa10fa0, READ: NFS: 0(Call completed successfully.), POSIX: -1(Unknown error 18446744073709551615), count: 0, is_eof: 0 i.e. not returning any data as well as not setting EOF.
Comment 1 Shehjar Tikoo 2010-12-21 00:44:23 EST
The bug is somewhere in io-cache, where it fails to propagate the op_errno from its subvolume to its parent. Still figuring out if there can be a quick fix.
Comment 2 Shehjar Tikoo 2010-12-21 01:28:43 EST
I think the bug is somewhere in ioc_fault_cbk where we need to copy the op_errno so that it gets propagated to parent xlator.
Comment 3 Shehjar Tikoo 2010-12-21 01:34:13 EST
(In reply to comment #1) > The bug is somewhere in io-cache, where it fails to propagate the op_errno from > its subvolume to its parent. Still figuring out if there can be a quick fix. Confirmed that by removing other translators one by one. Adding io-cache introduces the bug.
Comment 4 Anand Avati 2011-02-18 23:32:41 EST
PATCH: http://patches.gluster.com/patch/6115 in master (performance/quick-read: disable caching for fds opened with GF_OPEN_NOWB flags.)
Comment 5 Amar Tumballi 2011-04-13 01:08:58 EDT
The issue is fixed. And hence we don't need any document about this bug. (as a known issue).
Comment 6 Saurabh 2011-04-15 02:26:35 EDT
bash-3.00# ls -li f.3 6590356411317675395 -rw-r--r-- 1 root root 12 Apr 15 14:54 f.3 bash-3.00# cat f.3 ddd aaa ggg bash-3.00# mount | grep nfs-test /mnt/nfs-test on nfs://10.1.12.134:38467/dist4 remote/read/write/setuid/devices/proto=tcp/vers=3/xattr/dev=4b40002 on Fri Apr 15 14:58:55 2011 bash-3.00# cat to the 12 byte file didn't hang.