Bug 81925
Summary: | File reads on ext3 filesystems corrupt | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Simon Matter <simon.matter> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | CC: | sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-02-10 15:41:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simon Matter
2003-01-15 08:49:08 UTC
OK, there are way too many variables in this so far. First job is to isolate some of them. The XFS thing is interesting, but we know for a fact that ext3 and XFS stress both the disks and the VM in different way, and there have been plenty of times in the past where only one fs tickled a problem but the fault turned out to be elsewhere, so that doesn't pin things down yet. This could be a disk fault, a driver fault, a filesystem fault in ext3, a filesystem fault in NFS, or a fault in the NFS server itself. First question --- what does the data corruption look like? Could you attach an example of the actual diff between the two files which mis-compare? (cmp -l will do that.) If you repeat the diff several times, do you always get the same diffs? (ie. can we eliminate read problems during the final diff as the fault here?) Next, you were copying from NFS to two different filesystems. If there is an NFS problem here, then NFS could produce different answers each time, and ext3 would faithfully record that. What happens if, instead, you copy to local disk once, and then copy _that_ to the multiple different destinations? I know about the many variables which already made it difficult for me to isolate things. The other problem is that it takes me way too much time to run those tests. Anyway, here we go: XFS <-> ext3, I know they are very different. The only thing I know for sure is that XFS never failed (I mean in this scenario) while ext3 fails with recent 2.4.18 based RedHat kernels while it doesn't with the 2.4.9-34 version. --- > This could be a disk fault, a driver fault, a filesystem fault in ext3, a > filesystem fault in NFS, or a fault in the NFS server itself. --- Disk fault. Could be, but I have pushed them very hard. I have already diffed terabytes on them with XFS without a single error. So, I guess they are fine. Driver fault. Looks like being the same as the 'disk fault' thing. At least the driver is always the same. And I'm using this controller on my main server since two years. Filesystem fault in ext3, maybe yes, but as you said, may be diffcult to prove. Filesystem fault in NFS, or a fault in the NFS server itself. No. I'm copying the nfs data 5 times on the local volume, which is 145G in altoghether. I run only one diff against the nfs server, the four others are between the copies on disk. --- > First question --- what does the data corruption look like? Could you attach > an > example of the actual diff between the two files which mis-compare? (cmp -l > will do that.) --- I didn't do that yet, see below --- > If you repeat the diff several times, do you always get the same diffs? (ie. > can we eliminate read problems during the final diff as the fault here?) --- I get one or more different diff errors with every run. I know they are read errors because I compare 1 with 2, 2 with 3, 3 with 4 and so on. How can 2 and 3 be different while 1=2, 3=4 and 4=1? --- > Next, you were copying from NFS to two different filesystems. If there is an > NFS problem here, then NFS could produce different answers each time, and ext3 > would faithfully record that. What happens if, instead, you copy to local disk > once, and then copy _that_ to the multiple different destinations? --- Sorry, I wasn't clear. I was always using the same disks, with the same partitions, with the same raid5 volume. The filesystem was 88% filled because I first did this test to test the disk drives. Another diff has just finished and shows this now: Binary files /mnt/nfs/mp3/scorpions-face_the_heat.track04.mp3 and /nfs1/mp3/scorpions-face_the_heat.track04.mp3 differ Binary files /nfs3/ISO/Beta/phoebe-i386-disc1.iso and /nfs4/ISO/Beta/phoebe-i386-disc1.iso differ Now I do 'cmp -l /nfs3/ISO/Beta/phoebe-i386-disc1.iso /nfs4/ISO/Beta/phoebe-i386-disc1.iso' No error. I do 'cmp -l /mnt/nfs/mp3/scorpions-face_the_heat.track04.mp3 /nfs1/mp3/scorpions-face_the_heat.track04.mp3' No error. What information do you need. I keep this box available for testing as long as possible. NFS could _easily_ be a problem. You have copied from NFS to 5 different local filesystems. If NFS gave the wrong data one of those times, then afterwards you'd find that the local filesystems don't match each other. That's why copying from local disk to local disk after the first copy from NFS would help to isolate things. However, given the other data here, that doesn't look like the problem --- you do indeed look as if there are read errors. I've *never* seen that caused by a filesystem fault --- it is almost always a hardware problem, and occassionally a software VM problem. The next thing is definitely to capture a snapshot of just what the corruption looks like. For example, one pattern of corruption I've seen (again, only on reads) on Promise controllers in the past was a missing run of 4 bytes from the middle of a file. If this is hardware, it might well show up clearly like that. One way to catch the corruption might be to repeatedly copy the tree from disk to disk, so that hopefully at least one iteration will end up reading the original tree wrongly and recording the corruption pattern permanently during the copy. NFS is _not_ the problem here. I can easily reproduce the problem without any NFS involved. I was unable to reproduce the problem using cmp -l. No differences showed up, even when doing the diff's in parallel. I have not being able to reproduce this problem with kernel 2.4.9-34 on ext3 and XFS or 2.4.18-18 on XFS. The only way I can reproduce this is with 2.4.18-18 and later on ext3, but as you correctly stated, that doesn't mean it is the filesystems fault. I have then removed the software RAID5 and created 4 independent ext3 filesystems on then. Running some diff's in parallel between those 4 filesystems generates the same error but even quicker and with less data involved. I have now copied the 4 directories to the root filesystem, which is on U160 SCSI with software RAID5. I have tried my diff script several time only on the SCSI disks with no error. I have then started the diff processes on the SCSI and IDE filesystems in parallel, and voila, the SCSI filesystems reported errors too. The only thing and can try now is to reproduce the problem on a completely different hardware, right? This is definitely looking like an IDE-level problem. So far it's unclear whether the problem is hardware or a driver fault, though. Do you have any indications of IDE errors occurring in the logs? If you're still in UDMA-100 mode, then the controller should be detecting CRC failures on the cable and the driver will retry them, but if the driver has had problems staying in UDMA mode then it may have backed down to a non-CRC-capable IO mode. I strongly suspect that the reason 2.4.9-34 worked OK is that it had far earlier IDE drivers, and simply was not capable of driving your PDC20268 to the fullest extent, thus leaving the card in a slower but less demanding mode. As for the XFS kernel, it's not clear whether the difference is just in the way XFS is driving the hardware, or whether you built the XFS-enabled kernel with different IDE config options, so again it could have been that the most advanced driver options were not enabled there. Beyond this, we're going to have to look at either booting IDE with an argument to force the driver into a slower mode, or using a later kernel with updated IDE drivers from work done upstream since 2.4.18. Would you be willing to test a Phoebe (current Red Hat Linux beta) kernel on this hardware? That is based on a kernel closer to 2.4.20, and has a number of IDE improvements as a result. There are no IDE errors showing up in the logs. All four disks look like this: Model=CI530L06VARE700- , FwRev=REO64AA6, SerialNo= S PZZT8T4F34 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=120103200 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=yes: disabled (255) Drive Supports : ATA/ATAPI-5 T13 1321D revision 1 : ATA-2 ATA-3 ATA-4 ATA-5 multcount = 16 (on) I/O support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead = 8 (on) geometry = 119150/16/63, sectors = 120103200, start = 0 My XFS kernels are compiled from the RedHat source RPM with only the XFS bits added. Anything else is the same! I have then put a root filesystem onto the IDE disks and bootet the original RedHat kernel to verify that it's not the XFS kernel's fault. Now I went back to by true root filesystem, which is XFS. I'd like to try the Phoebe kernels but can't do it quickly because it still doesn't have XFS included. Let me know if I'm wrong. To correct things, I have finally been able to create corruption on XFS too. If I find the time, I'll try with a newer kernel from RedHat Phoebe. I have tried 2.4.20-2.2 from Phoebe now and it's all the same. I have also switched off DMA and used pio4 without success (hdparm -d0 -X12 /dev/hd[e-h]). That way corruption got even worse. It really seems to be a problem in the new IDE or PDC code here. Kernel 2.4.9-34 works perfect. Finished my final tests right now. A simple bonnie shows significant speed improvements between RH 2.4.9-34 and RH 2.4.[18,20] of up to 100%. Unfortunately, something broke, at least on the hardware I use. ext3 and XFS (on XFS enhanced RH kernel) show almost the same performance here. The really big problem is that people won't realize the problem until they do some serious integrity check. I have just tried with the newest errata kernel 2.4.18-24 and the problem still exists. Interesting update: I have moved the Promise controller and the four disks to my small production server, which is a AMD K6-400 with VIA MVP3 chipset with exactly the same RedHat installation. I can't reproduce the problem here. So I'm starting to think it's a problem between the serverworks chipset and the Probmise card on the other server. Did you have any luck trying to catch a snapshot of what the corruption actually looks like? If you copy the source tree multiple times off disk, then you're hopefully going to hit a read error once in a while which will end up being copied to the target dir, so that when you get a verification error you'll be able to do a "cmp -l" to find the exact corruption later. I think we'll ulimately have to close this as NOTABUG, though. What I said this morning was unfortunately wrong. I am now running my test on my production server, with another Promise Ultra100TX2, other cables and the same disks. I've been able to reproduce corruption with kernel 2.4.18-24 _and_ 2.4.9-34 so it's not limited to the newer kernels. My next - and I hope last steps - will be to 1) upgrade the microcode (firmware) on the IBM disks. When this doesn't help, I'll do 2) connect the disks to the onboard VIA IDE controller. If this corrects the problem, then it's definitely the Promise controller hardware or the PDC kernel driver. I was unable to verify the error with "cmp -l" yet but will try it again. Okay, here is a corruption pattern: [root@crash FreeBSD]# pwd /home/XXL/x/FreeBSD [root@crash FreeBSD]# cmp -l 4.7-disc2.iso /home/XXL/backup/FreeBSD/4.7-disc2.iso 614860720 341 241 I have to run this ~10 times to get the error once. The next error after several test was: [root@crash FreeBSD]# cmp -l 4.7-disc2.iso /home/XXL/backup/FreeBSD/4.7-disc2.iso 62316588 323 123 In hex, that's: 24A607B0: E1 A1 03B6E02C: D3 53 Single bit-flip errors, in different bits. This is most definitely looking like a chipset problem. Good luck in trying to hunt it down! To finish this thread: It came out that the Promise or VIA controller as well as the cables are okay. The only problem comes from the harddisks which are IBM IC35L060AVER07-0. I can put them in any box I want and am able to reproduce those single bit errors. Are all of the disks behaving badly, or just one? It could be bad memory on the disk's internal cache, for example, but that excuse fades if they all show the same problem (and you wouldn't really expect different bits to flip each time if it's just bad memory.) Okay, I ran some more tests. The error rate differs between the drives but they all produce errors. And before you ask, they are not from the same batch. Two of them were produced in Thailand, two of them on the Philippines. Manufacturing date differs by some months. Firmware level is different too. Looks really bad. Two of the drives were already replaced by IBM because they started to report bad sectors whithin the first month of operation. I'm very sure I'm not alone but most people won't find out what their disks are doing. No, I don't believe it! I have just installed a Maxtor drive and I can reproduce the same errors. I have also disabled whatever I can think with hdparm. It looks like that now: /dev/hde: multcount = 0 (off) I/O support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 0 (off) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead = 0 (off) geometry = 77504/16/63, sectors = 78125000, start = 0 The error are still here. It seems that whenever I have four IDE disks installed and access them alltoghether, I get wrong data in read operations, not write operation. This happens with two different mailboard chipsets, two different CPUs, two different IDE controllers, two different IDE cables sets and two different disk types. It does never happen with SCSI disks on the same systems. I will try the following tonight: - Doing a fresh install of RedHat 7.2. - Update everything to the lastest errata level. - Try to reproduce the bug. - Move the disks to the other server. - Repeat the tests. Let's see what comes out. Maybe I have to see my Doctor soon... To finish this thread, here is what came out: Corrupt IDE read operations with 1bit errors every ~5Gb were discovered with the following hardware, independant whether (U)DMA was enabled or disabled, when stressing four disks simultaneously: Serverworks LE / Promise Ultra100TX2 (PDC20268) VIA MVP3 / Promise Ultra100TX2 (PDC20268) VIA MVP3 / VIA MVP3 integrated IDE No corruption exists with: Intel 440BX / Promise Ultra100TX2 (PDC20268) Intel 440BX / Intel 440BX integrated IDE |