Description of problem:
While testing a patch for bz 245024, the qe test program genesis crashed with corrupted data.
The setup consists of:
- one GFS server (running 2.6.18-201.el5) exported via NFS.
- one client (also running 2.6.18-201.el5) mounting the NFS filesystem
Running genesis (modified by me to only do append operations) on the client in the nfs mount with the following command line:
/root/sts-rhel5/src/genesis/genesis -i 300s -p5 -n5 -S 5555 -d 1 -k -L fcntl 2> /tmp/genlog
does the following:
genesis creates 5 files in 1 directory and forks 5 processes to do the following continuously for 300s.
1. open any one of the 5 files at random
2. lock the file using locking specified (fcntl here)
3. seek to the end of the file
4. write a random number of bytes (upto 1024)
5. seek back to offset where the write started
6. read the number of bytes written in step 4
7. verify that the bytes written and bytes read back are identical
8. unlock file
9. close file
One of the processes will fail to verify the write (reads nulls in step 6) and crashes the test.
Created attachment 422268 [details]
relevant portion of tcpdump on the server when the crash occurs
When viewed with wireshark, this tcpdump shows that an out-of-order GETATTR Reply packet (for inode 79) was sent after a WRITE operation. The WRITE operation wrote 1024@694092 making isize=695116... but the out-of-order GETATTR reply sends the isize as 694092 and the client is unable to read to the actual end of the file because of this.
Created attachment 422269 [details]
nfs log snippet on client when crash occured
It looks like 47c287183705bdbe3c10bf1d57636589d1f336b3 applies cleanly to RHEL5. I think it's the right fix for this. I'm adding this to my test kernels and will plan to do some regression testing with it.
I think it's reasonable that we can make 5.6 with this too (PM willing), so I'll go ahead and propose it for that.
Created attachment 422625 [details]
patchset1 -- 4 patches from upstream that should correct problem
The only problem with this patch is that it might mean that we occasionally lose attribute updates that are valid. There are a few other patches that went in after this one that help recover some of that. This set reflects the backport of those. I think we'll want all 4.
A nice bonus to this is that it seems to reduce some of the false cache invalidations we see without this patch. Whenever these calls get reordered like this, we not only get the metadata wrong, but we have to invalidate the cache for the inode. By discarding stale updates like this, we minimize this effect to a large degree. That helps performance.
*** Bug 438676 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
*** Bug 245024 has been marked as a duplicate of this bug. ***
You can download this test kernel from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.