Bug 470267
| Summary: | cifs: data corruption due to interleaved partial writes timing out | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Tomas Smetana <tsmetana> |
| Component: | kernel | Assignee: | Jeff Layton <jlayton> |
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 5.2 | CC: | bugproxy, jjarvis, jlayton, jtluka, mgahagan, qcai, riek, rwheeler, shirishp, staubach, steved, syeghiay, tao |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | powerpc | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2009-01-20 19:38:30 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Tomas Smetana
2008-11-06 14:36:53 UTC
Created attachment 322726 [details] Proposed patch Three patches, http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=edf1ae403896cb7750800508b14996ba6be39a53 http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=61de800d33af585cb7e6f27b5cdd51029c6855cb http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=c527c8a7ffa18400c2c1488f7ab5aff5e83f3c8e to RHEL 5.3. Created attachment 322730 [details]
Same as the previous one, but changed 3 seconds from 7 seconds in two places in connect.c in ipv6_connect
So what's the reason for the divergence from upstream here? Why should we be using a 7s instead of a 3s one? Looks like upstream does this? (ugh, hate that double-negative)
(*csocket)->sk->sk_rcvtimeo = 7 * HZ;
if (!noblocksnd)
(*csocket)->sk->sk_sndtimeo = 3 * HZ;
Created attachment 322739 [details]
backport of all 3 patches
The timeouts don't look right, in ipv6_connect on that last patch. It looks like we're not actually setting sndtimeo in the normal case (noblocksnd unset).
I think this one looks like it does the right thing...
Changing flags to "blocker?" since this bug may lead to data corruption. Created attachment 322772 [details]
backport of all 3 patches -- fix missing sk_rcvtimeo set too
My last patch removed the set of sk_rcvtimeo in ipv6_connect. This one should be right
Email from Shirish about a reproducer for this. I haven't attempted his method yet, but we may be able to speed this along by combining the test below with a netem qdisc filter to simulate heavy packet loss:
----------------[snip]--------------
Anyone who needs to recreate this without using IBM test tools can do so
with
the following setup:
Two systems (one client, one server)
On the server create 6 or more shares (the more threads the quicker the
repro)
From the client mount each of the shares using CIFS
Create the following script named "copy" and run it against each mount
point on
the client:
#!/bin/bash
dd if=$1 of=$2/$1 bs=1k 2>/dev/null
while `diff $1 $2/$1 >dev/null`
do
dd if=$1 of=$2/$1 bs=1k 2>/dev/null
done
I used a file named source.txt that was about 1.5M (any file will do) with
mounts named sharen - shares - so in six terminals:
./copy source.txt /mnt/sharen
./copy source.txt /mnt/shareo
etc. etc. for all six mounts
start a "tail -f /var/log/messages" in another windows and after a couple
of
hours you should see:
"CIFS VFS: No writeable handles for inode"
This will be accompanied by file truncation on one of the sessions, even
though if you remove the >/dev/null on the dd inside the while loop you
will note that dd thinks it successfully wrote the data to the drive.
We are still working on a script to more closely mirror the IBM test tool
which results in a block of 0's in the middle of the file. However, this
script causes unreported data loss, so it should be enough to show that
there is an issue.
Testing the above reproducer now. I've also added a netem qdisc that simulates 10% packet loss. I'm hoping that will help spur the failure earlier. So far it seems to be running w/o any problems and I haven't seen the message pop in the ring buffer. Still running and no errors yet. As a side note that cifs over ipv6 isn't implemented in RHEL5, so we don't need to obsess over getting that codepath right. It doesn't hurt to do so, however. The reproducer finally hit an error around 7pm last night. I'm rerunning it now with the patch. If it hasn't failed by tomorrow morning, I'll declare victory and post the patch for review tomorrow. The reproducer ran overnight so it looks like it generally fixes the reproducer. That said, it looks like there is a problem with this patch. If we don't send all of the frame in smb_send2, then we're marking the socket CifsNeedReconnect, but what about smb_send? Granted, that's generally for sending smaller frames and the problem is less likely to occur there (and probably won't result in data corruption), but it seems like we should be doing the same thing there. Ugh, fixing *that* is non-trivial. The function interfaces here are a mess... smb_send doesn't take a TCP_Server_Info arg, it takes a socket. So we don't have an easy way to mark the socket as needing to be reconnected. We could change it to take a server arg, but then we'll have to fix the ipv4_connect code to do something different. To fix this the right way will mean a number of cascading changes. We may just have to take the patch as-is for now, and try to fix it the right way later. I'll wait until Shirish chimes in before I post it though... (In reply to comment #14) I think we should take these patches as it is. They fix the data integrity problem and in the testing done so far, they have not introduced any newer bugs/problems. There are definitely places and ways these patches could and should be improved but for now, I think we should incorporate the patches as they are. Agreed, but mainly because we're running out of time for 5.3. Hopefully you can get these problems fixed upstream so that we can pull those fixes in for 5.4. in kernel-2.6.18-124.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 This is failing QA in IBM's tests of snapshot 4 & 5. Can we get any details on the IBM testing and the observed failures? From the LTC bugzilla: Comment #220 From Shirish S. Pargaonkar 2008-12-16 14:16:56 EST [reply] ------- Ran Consoleapp test on RHEL 5.3 snapshot 5 and the DI error occurred within few hours. Linux dende-04 2.6.18-125.el5PAE #1 SMP Mon Dec 1 17:56:48 EST 2008 i686 i686 i386 GNU/Linux Will need to test on RHEL 5.3 snapshot 6. Created attachment 327232 [details]
patch -- prevent cifs_writepages from skipping pages
Shirish emailed me late last night and said that the problem he was seeing was because RHEL5.3 was missing this patch.
Working on getting confirmation from him now. I never saw a problem with the provided reproducer without this patch, so there may be something particular about the testcase that IBM is using that helps trigger the problem.
in kernel-2.6.18-128.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 The reproducer I have survived overnight. That's a good sign, but I'll note that it did that without the latest patch as well. We'll have to wait for results from IBM's proprietary reproducer for final results. Shirish's reproducer also survived last night on his modified RHEL5 kernel. That kernel had the patch that we applied to -128.el5. He's now testing the actual -128.el5. We should have final results tomorrow, but I expect that it will also survive. The CIFS code he tested last night should be virtually identical to the code in -128.el5. Email from Shirish this morning: -------------------[snip]------------------- The test is passing (as expected) on the .128 kernel. Thanks for all the help to get all the fixes in. It would have been unfortunate if we had missed Shaggy's patch. -------------------[snip]------------------- Looks like it's good. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |