Red Hat Bugzilla – Full Text Bug Listing
|Summary:||iozone hangs during random read throughput test|
|Product:||[Community] GlusterFS||Reporter:||Shehjar Tikoo <shehjart>|
|Component:||nfs||Assignee:||Shehjar Tikoo <shehjart>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Shehjar Tikoo 2010-05-07 03:12:28 EDT
A user reported that iozone hangs. I've isolated the problem to NFSx behaving badly when iocache is loaded underneath and the workload consists of random read using iozone. The only sign of the problem is that iozone will generally take longer to finish the test. The more useful sign is in the syslog on the NFS client: NFS: server cheating in read reply: count 65536 > recvd 12288 NFS: server cheating in read reply: count 65536 > recvd 12288 NFS: server cheating in read reply: count 65536 > recvd 61440 From the above, we can see that the server is telling client that the read reply contains 64k bytes but in the actual data part, server sends lesser number of bytes. The problem is that with random reads, there is a possibility of io-cache or read-ahead returning a read buffer that straddles two separate pages in ioc or ra. Both these translators in such a situation return two iovecs, both pointing into two different pages in ioc or ra. Current nfs3 read reply does not return as many vectors as received from a subvolume, instead it returns to the NFS client, the first iovec. This leads to a short read for the NFS client. Although NFS clients are expected to handle short-reads from the server, according to the spec, it takes a bit longer for the test to finish because the client as to send a few more requests to get the pending bytes.
Comment 1 Anand Avati 2010-05-08 06:37:41 EDT
PATCH: http://patches.gluster.com/patch/3238 in master (rpcsvc: Support multiple vectors during reply submission)
Comment 2 Anand Avati 2010-05-08 06:37:45 EDT
PATCH: http://patches.gluster.com/patch/3239 in master (nfs3: Submit multiple vectors received in read callback)
Comment 3 Anand Avati 2010-05-10 02:18:03 EDT
PATCH: http://patches.gluster.com/patch/3244 in master (rpcsvc: Move xdr round up functions to rpc code)
Comment 4 Anand Avati 2010-05-10 02:18:07 EDT
PATCH: http://patches.gluster.com/patch/3245 in master (nfs3: Round-up read reply bytes of multi-vector reply)
Comment 5 Anand Avati 2010-05-11 10:10:00 EDT
PATCH: http://patches.gluster.com/patch/3247 in master (nfs: fix warning on 32 bit)
Comment 6 Shehjar Tikoo 2010-06-01 23:05:56 EDT
Regression Test: The problem occurs when NFS client sends reads for random offsets which end up straddling multiple pages in io-cache. io-cache handles this by returning two iovecs, one each from the two pages. Test Case: 1. A simple posix+io-threads configuration will do for this test, as long as there is io-cache between the io-threads and nfs so that reads are going through io-cache. 2. Run the iozone read test and ensure that the performance of the random read test is the same when run with and without io-cache. If the performance drops with io-cache, we have a regression.
Comment 7 Shehjar Tikoo 2010-06-01 23:07:54 EDT
For the record, this bug in nfsx also points to a bug in io-cache. IO-cache needs to disable caching when it receives file opens with the GF_OPEN_NOWB flag but does not and hence server the random reads from NFS from the cache. Although it is good to have nfsx support multi vector reads, we need to fix io-cache also. That fix is coming soon.
Comment 8 Shehjar Tikoo 2010-06-08 04:29:30 EDT
(In reply to comment #7) > For the record, this bug in nfsx also points to a bug in io-cache. IO-cache > needs to disable caching when it receives file opens with the GF_OPEN_NOWB flag > but does not and hence server the random reads from NFS from the cache. That comment on io-cache bug is nfs specific and does not come into play when used with FUSE.
Comment 9 Shehjar Tikoo 2010-07-14 04:50:34 EDT
Patches submitted in comment 3 and 4 are a fix for bug 762588. To reproduce 902, use the commit before all of the patches below. To reproduce 856, use the commit before patches in comment 3 and 4.