Firmware level PHYP, FSP, Etc: JS22 = 01EA330_031_031 Machine Type: JS22 (POWER) OS: RHEL 5.2 Steps to recreate the problem: Setup is JS22 running VIOS 815B with a partition running i5/OS and another partition running RHEL 5.2. The RHEL 5.2 partition is using CIFS mounts to write data through the MPx test tool to multiple SMB servers on other blades running various OSes. While writing the data to the shares, MPx shows input/output errors and these messages show up in the /var/log/messages (note that the CIFS module code has been modified, these might not be standard messages) kernel: CIFS VFS: smb_send2 on sock c00000003bb1bc80 stuck for 15 seconds kernel: CIFS VFS: Error -11 sending data on socket to server kernel: CIFS VFS: Write2 ret -11, wrote 0 There was a miscomapre, and this time we have a tcpdump output. Unfortunately, lot of packets were dropped, but nonetheless will look at the output once I am able to access the file. b165s1:/tcp # tcpdump -s 256 -i eth4 -w DataMisCompare-trial01-part1 tcpdump: listening on eth4, link-type EN10MB (Ethernet), capture size 256 bytes 5431443 packets captured 20861671 packets received by filter 9998785 packets dropped by kernel b165s1:/usr/src/linux # grep "00000000 " /var/log/messages | more b165s1:/usr/src/linux # grep "11111111 " /var/log/messages | more b165s1:/usr/src/linux # grep "6db66db6 " /var/log/messages | more Jul 1 23:59:50 b165s1 kernel: 6db66db6 6db66db6 m ¶ m ¶ m ¶ m ¶ Jul 1 23:59:50 b165s1 kernel: 6db66db6 6db66db6 m ¶ m ¶ m ¶ m ¶ Jul 1 23:59:50 b165s1 kernel: 6db66db6 6db66db6 m ¶ m ¶ m ¶ m ¶ Jul 1 23:59:50 b165s1 kernel: 6db66db6 6db66db6 m ¶ m ¶ m ¶ m ¶ Jul 1 23:59:50 b165s1 kernel: 6db66db6 6db66db6 m ¶ m ¶ m ¶ m ¶ The cifs printfs do not show any pages written with 0s.
Created attachment 322726 [details] Proposed patch Three patches, http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=edf1ae403896cb7750800508b14996ba6be39a53 http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=61de800d33af585cb7e6f27b5cdd51029c6855cb http://git.kernel.org/?p=linux/kernel/git/sfrench/cifs-2.6.git;a=commitdiff;h=c527c8a7ffa18400c2c1488f7ab5aff5e83f3c8e to RHEL 5.3.
Created attachment 322730 [details] Same as the previous one, but changed 3 seconds from 7 seconds in two places in connect.c in ipv6_connect
So what's the reason for the divergence from upstream here? Why should we be using a 7s instead of a 3s one?
Looks like upstream does this? (ugh, hate that double-negative) (*csocket)->sk->sk_rcvtimeo = 7 * HZ; if (!noblocksnd) (*csocket)->sk->sk_sndtimeo = 3 * HZ;
Created attachment 322739 [details] backport of all 3 patches The timeouts don't look right, in ipv6_connect on that last patch. It looks like we're not actually setting sndtimeo in the normal case (noblocksnd unset). I think this one looks like it does the right thing...
Changing flags to "blocker?" since this bug may lead to data corruption.
Created attachment 322772 [details] backport of all 3 patches -- fix missing sk_rcvtimeo set too My last patch removed the set of sk_rcvtimeo in ipv6_connect. This one should be right
Email from Shirish about a reproducer for this. I haven't attempted his method yet, but we may be able to speed this along by combining the test below with a netem qdisc filter to simulate heavy packet loss: ----------------[snip]-------------- Anyone who needs to recreate this without using IBM test tools can do so with the following setup: Two systems (one client, one server) On the server create 6 or more shares (the more threads the quicker the repro) From the client mount each of the shares using CIFS Create the following script named "copy" and run it against each mount point on the client: #!/bin/bash dd if=$1 of=$2/$1 bs=1k 2>/dev/null while `diff $1 $2/$1 >dev/null` do dd if=$1 of=$2/$1 bs=1k 2>/dev/null done I used a file named source.txt that was about 1.5M (any file will do) with mounts named sharen - shares - so in six terminals: ./copy source.txt /mnt/sharen ./copy source.txt /mnt/shareo etc. etc. for all six mounts start a "tail -f /var/log/messages" in another windows and after a couple of hours you should see: "CIFS VFS: No writeable handles for inode" This will be accompanied by file truncation on one of the sessions, even though if you remove the >/dev/null on the dd inside the while loop you will note that dd thinks it successfully wrote the data to the drive. We are still working on a script to more closely mirror the IBM test tool which results in a block of 0's in the middle of the file. However, this script causes unreported data loss, so it should be enough to show that there is an issue.
Testing the above reproducer now. I've also added a netem qdisc that simulates 10% packet loss. I'm hoping that will help spur the failure earlier. So far it seems to be running w/o any problems and I haven't seen the message pop in the ring buffer.
Still running and no errors yet. As a side note that cifs over ipv6 isn't implemented in RHEL5, so we don't need to obsess over getting that codepath right. It doesn't hurt to do so, however.
The reproducer finally hit an error around 7pm last night. I'm rerunning it now with the patch. If it hasn't failed by tomorrow morning, I'll declare victory and post the patch for review tomorrow.
The reproducer ran overnight so it looks like it generally fixes the reproducer. That said, it looks like there is a problem with this patch. If we don't send all of the frame in smb_send2, then we're marking the socket CifsNeedReconnect, but what about smb_send? Granted, that's generally for sending smaller frames and the problem is less likely to occur there (and probably won't result in data corruption), but it seems like we should be doing the same thing there.
Ugh, fixing *that* is non-trivial. The function interfaces here are a mess... smb_send doesn't take a TCP_Server_Info arg, it takes a socket. So we don't have an easy way to mark the socket as needing to be reconnected. We could change it to take a server arg, but then we'll have to fix the ipv4_connect code to do something different. To fix this the right way will mean a number of cascading changes. We may just have to take the patch as-is for now, and try to fix it the right way later. I'll wait until Shirish chimes in before I post it though...
(In reply to comment #14) I think we should take these patches as it is. They fix the data integrity problem and in the testing done so far, they have not introduced any newer bugs/problems. There are definitely places and ways these patches could and should be improved but for now, I think we should incorporate the patches as they are.
Agreed, but mainly because we're running out of time for 5.3. Hopefully you can get these problems fixed upstream so that we can pull those fixes in for 5.4.
in kernel-2.6.18-124.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
This is failing QA in IBM's tests of snapshot 4 & 5.
Can we get any details on the IBM testing and the observed failures?
From the LTC bugzilla: Comment #220 From Shirish S. Pargaonkar 2008-12-16 14:16:56 EST [reply] ------- Ran Consoleapp test on RHEL 5.3 snapshot 5 and the DI error occurred within few hours. Linux dende-04 2.6.18-125.el5PAE #1 SMP Mon Dec 1 17:56:48 EST 2008 i686 i686 i386 GNU/Linux Will need to test on RHEL 5.3 snapshot 6.
Created attachment 327232 [details] patch -- prevent cifs_writepages from skipping pages Shirish emailed me late last night and said that the problem he was seeing was because RHEL5.3 was missing this patch. Working on getting confirmation from him now. I never saw a problem with the provided reproducer without this patch, so there may be something particular about the testcase that IBM is using that helps trigger the problem.
in kernel-2.6.18-128.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
The reproducer I have survived overnight. That's a good sign, but I'll note that it did that without the latest patch as well. We'll have to wait for results from IBM's proprietary reproducer for final results.
Shirish's reproducer also survived last night on his modified RHEL5 kernel. That kernel had the patch that we applied to -128.el5. He's now testing the actual -128.el5. We should have final results tomorrow, but I expect that it will also survive. The CIFS code he tested last night should be virtually identical to the code in -128.el5.
Email from Shirish this morning: -------------------[snip]------------------- The test is passing (as expected) on the .128 kernel. Thanks for all the help to get all the fixes in. It would have been unfortunate if we had missed Shaggy's patch. -------------------[snip]------------------- Looks like it's good.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html