Bug 135017

Summary:	Data corruption in memory mapped file on SATA drive
Product:	Red Hat Enterprise Linux 3	Reporter:	Eric J Korpela <korpela>
Component:	kernel	Assignee:	Jeff Garzik <jgarzik>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	3.0	CC:	adolfo, jgarzik, korpela, nfaerber, peterm, petrides, ppokorny, riel
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-03-02 20:06:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric J Korpela 2004-10-08 01:06:05 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)
Gecko/20040206 Firefox/0.8

Description of problem:

When using a large memory mapped file for high rate access on a SATA
drive occasionally a cache-line (64 byte) sized chunk of data is
written into the wrong page.


Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-20.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
1. Compile and run the following program on AMD64 with on serial ata
drive.
#define LARGEFILE_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <assert.h>


#define FILESIZE ((off_t)(2LL*1024*1024*1024*sizeof(off_t)))

int main(int argc,char *argv[]) {
  int fd;
  off_t i;
  off_t *p;

  fd=open("erase.me",O_CREAT|O_RDWR,0777);
  ftruncate(fd,FILESIZE);
  p=mmap((void
*)0,FILESIZE,PROT_READ|PROT_WRITE,MAP_NORESERVE|MAP_SHARED,fd,(off_t)0);
  for (i=0;i<FILESIZE/sizeof(off_t);i++) {
    p[i]=i;
  }
  for (i=0;i<FILESIZE/sizeof(off_t);i++) {
    if (p[i] != i) {
      fprintf(stderr,"Error at offset %lld != %lld\n",i,p[i]);
    }
  }
  return 0;
}

    

Actual Results:  
The program produces the output similar to:
Error at offset 641248800 != 641248288
Error at offset 641248801 != 641248289
Error at offset 641248802 != 641248290
Error at offset 641248803 != 641248291
Error at offset 641248804 != 641248292
Error at offset 641248805 != 641248293
Error at offset 641248806 != 641248294
Error at offset 641248807 != 641248295
Error at offset 718280736 != 722050080
Error at offset 718280737 != 722050081
Error at offset 718280738 != 722050082
Error at offset 718280739 != 722050083
Error at offset 718280740 != 722050084
Error at offset 718280741 != 722050085
Error at offset 718280742 != 722050086
Error at offset 718280743 != 722050087



Expected Results:  The program should produce no output.  On a SCSI
raid array attached to the same system the program produces no output.

Additional info:


System is a dual opteron 246 with 6 GB RAM.

SATA drive syslog entries:
Oct  7 10:58:59 zork kernel: ata1: SATA max UDMA/100 cmd
0xFFFFFF0000021080 ctl 0xFFFFFF000002108A bmdma 0xFFFFFF00
00021000 irq 25
Oct  7 10:58:59 zork kernel: ata2: SATA max UDMA/100 cmd
0xFFFFFF00000210C0 ctl 0xFFFFFF00000210CA bmdma 0xFFFFFF00
00021008 irq 25
Oct  7 10:58:59 zork kernel: ata3: SATA max UDMA/100 cmd
0xFFFFFF0000021280 ctl 0xFFFFFF000002128A bmdma 0xFFFFFF00
00021200 irq 25
Oct  7 10:58:59 zork kernel: ata4: SATA max UDMA/100 cmd
0xFFFFFF00000212C0 ctl 0xFFFFFF00000212CA bmdma 0xFFFFFF00
00021208 irq 25
Oct  7 10:58:59 zork kernel: ata1: dev 0 ATA, max UDMA/133, 488397168
sectors: lba48
Oct  7 10:58:59 zork kernel: ata1: dev 0 configured for UDMA/100
Oct  7 10:58:59 zork kernel: ata2: no device found (phy stat 00000000)
Oct  7 10:58:59 zork kernel: ata3: no device found (phy stat 00000000)
Oct  7 10:58:59 zork kernel: ata4: no device found (phy stat 00000000)
Oct  7 10:58:59 zork kernel: scsi1 : sata_sil
Oct  7 10:58:59 zork kernel: scsi2 : sata_sil
Oct  7 10:58:59 zork kernel: scsi3 : sata_sil
Oct  7 10:58:59 zork kernel: scsi4 : sata_sil
Oct  7 10:58:59 zork kernel:   Vendor: ATA       Model: WDC
WD2500SD-01K  Rev: 08.0
Oct  7 10:58:59 zork kernel:   Type:   Direct-Access                 
    ANSI SCSI revision: 05
Oct  7 10:58:59 zork kernel: Attached scsi disk sdg at scsi1, channel
0, id 0, lun 0
Oct  7 10:58:59 zork kernel: SCSI device sdg: 488397168 512-byte hdwr
sectors (250059 MB)
Oct  7 10:58:59 zork kernel:  sdg: sdg1 sdg2 sdg3

Comment 1 Jim Paradis 2004-10-08 01:40:26 UTC

Adding jgarzik to cc: list

Jeff - does this tickle any memories or suggest any obvious places to
go look?  Whole pages getting corrupted I can understand, but cache
lines are just a bit odd to me...

Comment 2 Jeff Garzik 2004-10-08 01:43:44 UTC

Has this been reproduced on more than one machine?

I ask because it smells like bad RAM or bad cache RAM to me.

Comment 3 Eric J Korpela 2004-10-08 06:14:35 UTC

memtest86 reports no problems after several runs.  Stand alone disk
tests run overnight with no errors.  The problem cannot be reproduced
on SCSI disks on the same machine.  It appears to definitely be
related to a page being written to disk.  The only thing I can think
of would be something related to L1 or L2 cache not being fully
flushed to the main RAM before a page is written to disk.  I don't
know enough about linux device drivers and the smp kernel to know if
this is possible.

Comment 4 Eric J Korpela 2004-10-08 06:17:53 UTC

I only have one machine to test on at present.  Any suggestions as to
where I could get access to a similar machine?

Comment 5 Jim Paradis 2004-10-08 19:17:26 UTC

Do you get the same results if you limit the memory to 4G (i.e. boot
with "mem=4G")?  This would suggest whether it may be an IOMMU issue...

Comment 6 Jeff Garzik 2004-10-08 21:38:02 UTC

Actually mem=1G would probably be better test (but in general, I agree
w/ Jim's comment #5)

Comment 7 Eric J Korpela 2004-10-09 00:44:11 UTC

I'm out of the office today, but I will try to get back in to test it
ASAP.

Comment 8 Eric J Korpela 2004-10-11 17:22:12 UTC

Using mem=1G did infact prevent the problem from occurring.  One thing
I do now notice is that even though IOMMU is enabled in the BIOS, I
get messages like the following in the boot log.

Oct 10 14:57:49 zork kernel: Checking aperture...
Oct 10 14:57:49 zork kernel: CPU 0: aperture @ 0 size 32768 KB
Oct 10 14:57:49 zork kernel: Your BIOS doesn't leave a aperture memory
hole
Oct 10 14:57:49 zork kernel: Please enable the IOMMU option in the
BIOS setup
Oct 10 14:57:49 zork kernel: Mapping aperture over 65536 KB of RAM @
8000000

and elsewhere

Oct 10 14:57:50 zork kernel: PCI-DMA: aperture base @ 8000000 size
65536 KB
Oct 10 14:57:50 zork kernel: PCI-DMA: Reserving 64MB of IOMMU area in
the AGP aperture

Comment 9 Eric J Korpela 2004-10-18 18:09:30 UTC

Since last report the drive had gotten corrupted enough that I needed
to reformat and reinstall.

Additional potential hints...

mem=4G causes a kernel panic for the SMP kernel, but is OK with non-SMP.

mem=4G-64M is OK.

iommu=off causes a kernel panic.
iommu=merge causes no change in errors
iommu=fullflush also causes no change

Comment 10 Nate Faerber 2004-12-24 00:33:24 UTC

I have seen an extreme case of this that I believe is related.  With
Western Digital drives and more than 4GB of RAM, the system will
suffer extreme file corruption.  In most cases, the RHEL 3 Update 3
install will appear to succeed but upon reboot, a lot of file
corruption occurs and eventually renders the system useless.

I have seen the problem on three different configs that had only two
things in common: Tyan S2885 w/ onboard SiI3114 SATA and a Western
Digital drive.  I have checked with Eric and his drive is also Western
Digital.
                                                                     
          
All my RAM configurations passed memtest86+:
8x 1GB ATP
8x 1GB Corsair
8x 2GB ATP (and 4x of the same 2GB ATP)
                                                                     
          
We have used three different S2885 motherboards each with a different
video card.  One system had an add-in sound card.  One had an add-in
3ware SATA RAID card.  One had no add-in PCI cards.

Other drives (Segate non-blacklist and Maxtor) do not suffer from the
extreme (can't reboot after install) case of this problem.  We are
working to determine whether these drives suffer from the more subtle
case exhibited but Eric's C program.

The same Western Digital drive will not show the extreme case when
attached to a 3ware RAID card.

The add-in SIIG 3114 does not suffer from the extreme (can't reboot
after install) case of this issue.  We are checking to see if it
passes Eric's program.

The Western Digital drives that suffer extreme failure:
WD360GD-00FNA0 (WD360 Raptor)
WD2500JD-55HBB0
WD2500JD-00HBB0
WD1600JD-00HBB0

Comment 11 Philip Pokorny 2004-12-24 17:25:19 UTC

The on-board Silicon Image 3114 controller is on the 32-bit/33MHz PCI
bus from the AMD-8111 south bridge.  Add-in 3Ware and SIIG controllers
were probably plugged into a 64-bit slot on the AMD-8131 PCI-X bridge.

Could that difference be important?

Comment 12 Eric J Korpela 2005-01-28 16:33:43 UTC

I concur with the comment that is is restricted to Western Digital
drives.  Replacement of the WD drive with a Seagate of equivalent
capacity has solved the problem on the server where it was initially
reported.

Comment 13 Jeff Garzik 2005-02-18 06:43:21 UTC

This is the "SATA 4GB boundary corruption" problem, which was recently
fixed.

Comment 14 Philip Pokorny 2005-02-18 07:00:31 UTC

Can you provide more information on this "4GB boundary corruption"? 
Is there another bugzilla tracking that problem?

Western Digital, Tyan and Silicon Image were able to reproduce the
problem and WD reported that Silicon Image said there was an issue
with the 3114 chip and memory accesses.

Silicon Image and Tyan released a new BIOS for the motherboard (with
new 3114 BIOS code) that solved the problem in the test system.

Comment 15 Jeff Garzik 2005-02-18 07:13:28 UTC

On x86-64 (EM64T only) and >= 4GB of memory, memory corruption would
occur.  However, looking at the bug report again, I see that it's
AMD64 not EM64T.

Nonetheless, you say a new BIOS fixed things, so I'll leave it closed.

Comment 16 Ernie Petrides 2005-03-02 20:06:01 UTC

Since this seems to have been a BIOS/firmware issue,
I'm closing it as NOTABUG (not a kernel bug, that is).

Comment 17 Philip Pokorny 2005-03-03 05:47:53 UTC

For closure, the specific version of Silicon Image Option ROM BIOS code needed is:

Silicon Image Oprom v5.0.48

Tyan released new BIOS (Feb, 2005) for the S2885 and S4882 with that version of
the Option ROM.  The S2882 motherboard has a BIOS with 5.0.44 which may also be OK?