Red Hat Bugzilla – Bug 763972
Solaris client hangs on file read operations
Last modified: 2015-12-01 11:45:32 EST
Even a cat on a 12 byte file hangs the Solaris client. The problem is somewhere in the translator stack on the read callback path.
Some translator is not propagating the op_errno=ENOENT which is set by posix on seeing an EOF. NFS uses op_errno = ENOENT to let clients know that end of file was reached file reading a file. It has worked till now because linux client never reads beyond the file size given in the file's attributes. Solaris client on the other hand, tries to read till the eof flag is set in the read reply.
For eg, the first NFS read request looks like:
nfs-nfsv3: XID: cea10fa0, READ: args: FH: hashcount 2, exportid c99db2fc-ab91-406d-a3a5-acc7c2b672d8, gfid 8c862c6b-00d7-4753-b76a-21544ed94363, offset: 0, count: 4096
i.e. request to read 4kb starting offset 0, which nfs server replies correctly as:
XID: cea10fa0, READ: NFS: 0(Call completed successfully.), POSIX: -1(Unknown error 18446744073709551615), count: 12, is_eof: 0, vector: count: 1, len: 12
But the EOF bit is not set for a file of 12 bytes so Solaris sends another read request:
XID: cfa10fa0, READ: args: FH: hashcount 2, exportid c99db2fc-ab91-406d-a3a5-acc7c2b672d8, gfid 8c862c6b-00d7-4753-b76a-21544ed94363, offset: 12, count: 4084
This time starting to read at offset 12, to which nfs server replies.
XID: cfa10fa0, READ: NFS: 0(Call completed successfully.), POSIX: -1(Unknown error 18446744073709551615), count: 0, is_eof: 0
i.e. not returning any data as well as not setting EOF.
The bug is somewhere in io-cache, where it fails to propagate the op_errno from its subvolume to its parent. Still figuring out if there can be a quick fix.
I think the bug is somewhere in ioc_fault_cbk where we need to copy the op_errno so that it gets propagated to parent xlator.
(In reply to comment #1)
> The bug is somewhere in io-cache, where it fails to propagate the op_errno from
> its subvolume to its parent. Still figuring out if there can be a quick fix.
Confirmed that by removing other translators one by one. Adding io-cache introduces the bug.
PATCH: http://patches.gluster.com/patch/6115 in master (performance/quick-read: disable caching for fds opened with GF_OPEN_NOWB flags.)
The issue is fixed. And hence we don't need any document about this bug. (as a known issue).
bash-3.00# ls -li f.3
6590356411317675395 -rw-r--r-- 1 root root 12 Apr 15 14:54 f.3
bash-3.00# cat f.3
bash-3.00# mount | grep nfs-test
/mnt/nfs-test on nfs://10.1.12.134:38467/dist4 remote/read/write/setuid/devices/proto=tcp/vers=3/xattr/dev=4b40002 on Fri Apr 15 14:58:55 2011
cat to the 12 byte file didn't hang.