Red Hat Bugzilla – Bug 764298
NFS read hangs when arequal-checksum script is run
Last modified: 2013-12-08 20:23:31 EST
When NFS open call fails (seen here due to a bug in quick-read open), the FH gets queued.
Subsequent FH open calls get queued and hence the read call is blocked.
arequal script hangs because the read replies are not being sent by the NFS server running on Solaris.
NFS, on seeing the first read call, opens the file and then handles the read request in a second step. If the open call fails, the error is supposed to be returned to the NFS client as part of the READ reply. This is not happening and is a bug. A patch has been sent.
The second part of the problem is in quick-read where the use of O_DIRECTORY in open is not a portable piece of code. Solaris does not have O_DIRECTORY and is defined to 0 by our build scripts. The zero value results in the failure of the open fop in quick-read. This prevents the open fop from even reaching the brick. This is the second bug and is pending a fix.
*** Bug 2545 has been marked as a duplicate of this bug. ***
PATCH: http://patches.gluster.com/patch/6556 in release-3.1 (Solaris: redefine O_DIRECTORY flag to the correct value)
PATCH: http://patches.gluster.com/patch/6531 in release-3.1 (nfs3: Flush file I/O call states on open failure)
PATCH: http://patches.gluster.com/patch/6532 in master (nfs3: Flush file I/O call states on open failure)