Bug 157342 - large xfer size data corruption on x86
large xfer size data corruption on x86
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Ben Marzinski
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-10 15:08 EDT by Corey Marthaler
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-09-11 11:42:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test program to reproduce the bug (3.58 KB, text/plain)
2005-05-23 16:26 EDT, Ben Marzinski
no flags Details

  None (edit)
Description Corey Marthaler 2005-05-10 15:08:00 EDT
Description of problem:
We ran the standard GFS full regression tests on x86_64, ia64, and x86 and hit
this problem on just x86. The 64 bit archs tested fine. We then narrowed down
the test case to just two doio/iogen cmdlines, and from there narrowed it down
to just large (300000 - 400000 block or 154 - 204Mb) transfer sizes. We ran with
the doio debugging and found that we saw this with only the syscalls write and
writev. We end up supposedly writing down a pattern but the pattern read back is
off by a couple of characters. We then tried these same cmdlines on ext3 (x86)
and did not seen the issue. We then ran it on rhel3-u4 GFS (x86) binaries on a
u4 kernel and rhel3-u5 GFS binaries on a u4 kernel and saw it in both of those
cases, so it possible this was been around for awhile. This was seen on all arch
for x86 (up, smp, hugemem).


iogen -f buffered -i 0 -m reverse -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwrevbuflarge | doio -avD
iogen -f buffered -i 0 -m random -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwranbuflarge -S 9746 | doio -avD


LINK-10 (RHEL3-U5):
doio (17921) 10:10:29
---------------------
r_type: 2 (write())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-12.lab.msp.redhat.com/rwranbuflarge
r_oflags: 1
r_offset: 65298640
r_nbytes: 52130959
r_pattern: J
r_uflags: 0


doio (17921) 10:10:30
---------------------
r_type: 103 (writev())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-12.lab.msp.redhat.com/rwranbuflarge
r_oflags: 1
r_offset: 8301467
r_nbytes: 162898181
r_pattern: F
r_uflags: 0


doio (17921) 10:11:04
---------------------
*** DATA COMPARISON ERROR ***
check_file(/mnt/single_nominor/link-12.lab.msp.redhat.com/rwranbuflarge,
8301467, 162898181, F:17921:link-12:doio*, 21, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 164732928
    1st 32 expected bytes:  921:link-12:doio*F:17921:link-12
    1st 32 actual bytes:    17921:link-12:doio*X:17921:link-

Request number 173
          fd 3 is file
/mnt/single_nominor/link-12.lab.msp.redhat.com/rwranbuflarge - open flags are 01
O_WRONLY,
          write done at file offset 8301467 - pattern is F (0106)
          number of requests is 1, strides per request is 1
          i/o byte count = 162898181
          memory alignment is unaligned

syscall:  writev(3, (iov on stack), 1)


LINK-12 (RHEL3-U5):
doio (18070) 10:09:25
---------------------
r_type: 103 (writev())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-10.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 1
r_offset: 242
r_nbytes: 875
r_pattern: R
r_uflags: 0


doio (18070) 10:09:25
---------------------
r_type: 2 (write())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-10.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 1
r_offset: 38088511
r_nbytes: 166711489
r_pattern: Q
r_uflags: 0


doio (18070) 10:10:10
---------------------
*** DATA COMPARISON ERROR ***
check_file(/mnt/single_nominor/link-10.lab.msp.redhat.com/rwrevbuflarge,
38088511, 166711489, Q:18070:link-10:doio*, 21, 0) failed

Comparison fd is 3, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 139128832
    1st 32 expected bytes:  io*Q:18070:link-10:doio*Q:18070:
    1st 32 actual bytes:    nk-10:doio*O:9736:link-10:doio*O
Request number 46
syscall:  write(4, 025512320010, 166711489)
          fd 4 is file
/mnt/single_nominor/link-10.lab.msp.redhat.com/rwrevbuflarge - open flags are 01
          write done at file offset 38088511 - pattern is Q:18070:link-10:doio*




2ND TIME ON LINK-12 (RHEL3-U5):
doio (18009) 10:51:45
---------------------
r_type: 103 (writev())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-12.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 1
r_offset: 35207907
r_nbytes: 169592093
r_pattern: G
r_uflags: 0


doio (18009) 10:51:51
---------------------
r_type: 2 (write())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/single_nominor/link-12.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 1
r_offset: 18960034
r_nbytes: 185839966
r_pattern: S
r_uflags: 0


doio (18009) 10:52:11
---------------------
*** DATA COMPARISON ERROR ***
check_file(/mnt/single_nominor/link-12.lab.msp.redhat.com/rwrevbuflarge,
18960034, 185839966, S:18009:link-12:doio*, 21, 0) failed

Comparison fd is 3, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 106422272
    1st 32 expected bytes:  nk-12:doio*S:18009:link-12:doio*
    1st 32 actual bytes:    09:link-12:doio*G:18009:link-12:
Request number 5
syscall:  write(4, 025417400010, 185839966)
          fd 4 is file
/mnt/single_nominor/link-12.lab.msp.redhat.com/rwrevbuflarge - open flags are 01
          write done at file offset 18960034 - pattern is S:18009:link-12:doio*


LINK-12 (RHEL3-U4 (U5 GFS))

doio ( 5359) 13:36:20
---------------------
r_type: 102 (readv())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/gfs1/link-12.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 0
r_offset: 2814900
r_nbytes: 201985100
r_pattern: *
r_uflags: 0


doio ( 5359) 13:37:49
---------------------
r_type: 103 (writev())
r_magic: 0x1c9d81 (valid)
r_file: /mnt/gfs1/link-12.lab.msp.redhat.com/rwrevbuflarge
r_oflags: 1
r_offset: 48290178
r_nbytes: 156509822
r_pattern: D
r_uflags: 0


doio ( 5359) 13:40:28
---------------------
*** DATA COMPARISON ERROR ***
check_file(/mnt/gfs1/link-12.lab.msp.redhat.com/rwrevbuflarge, 48290178,
156509822, D:5359:link-12:doio*, 20, 0) failed

Comparison fd is 4, with open flags 0
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 147787776
    1st 32 expected bytes:  o*D:5359:link-12:doio*D:5359:lin
    1st 32 actual bytes:    nk-12:doio*U:5359:link-12:doio*U

Request number 268
          fd 3 is file /mnt/gfs1/link-12.lab.msp.redhat.com/rwrevbuflarge - open
flags are 01 O_WRONLY,
          write done at file offset 48290178 - pattern is D (0104)
          number of requests is 1, strides per request is 1
          i/o byte count = 156509822
          memory alignment is unaligned

syscall:  writev(3, (iov on stack), 1)
Comment 1 Ben Marzinski 2005-05-18 13:30:09 EDT
Um... does doio do some sort of locking, to make sure that noone else is writing
to the file.  If I run this command on seperate files for each node -- i.e.

on nodeA
iogen -f buffered -i 0 -m reverse -s read,write,readv,writev -t 300000b -T
400000b 400000b:fileA | doio -avD

on nodeB
iogen -f buffered -i 0 -m reverse -s read,write,readv,writev -t 300000b -T
400000b 400000b:fileB | doio -avD

-- then everything works fine.

If I run them on the same file, then they get corruption.  If there isn't
supposed to be locking done, then I think that there is a problem with the test,
not gfs (link-10 is getting corruption when trying to work on a
file in the link-12 directory)

If there is supposed to be locking to protect against this, then this is
pretty obviously a locking problem.
Comment 2 Corey Marthaler 2005-05-18 18:20:50 EDT
Update:
Reproduced on my link-10/link-12 machines with both the 19 and 20 U5 GFS rpms.
doio/iogen in this case are doing no locking which is why they are all running
to seperate files. The data compare errors show that the same PID is being used
in the file so that eliminates the possiblity of two process clobbering each other.
Also when one process dies due to this issue, the others continue to write/read
the filesystem, some of those processes also eventually end up hitting this
issue and dying.
Comment 3 Ben Marzinski 2005-05-19 11:44:59 EDT
o.k. I still cannot see this bug. Just to make sure that I'm on the same page,
I'm going to list information about my setup.

I'm running on cypher-01 and cypher-03. Both machines have one cpu and 502280 kB
of RAM. Both machines use the default qla2300 kernel module. For storage, I'm
using a 108559206 kB (/dev/sda2) partition on a Tornado.

Both machines were imaged with the load-rhel3.master image
Then the following RPMS were installed:

perl-Net-Telnet-3.03-2.noarch.rpm
initscripts-7.31.22.EL-2.i386.rpm
kernel-smp-2.4.21-32.EL.i686.rpm
GFS-6.0.2.20-1.i686.rpm
GFS-debuginfo-6.0.2.20-1.i686.rpm
GFS-devel-6.0.2.20-1.i686.rpm
GFS-modules-smp-6.0.2.20-1.i686.rpm

I am using a ccs file archive, with the following ccs files
cluster.ccs:
cluster {
        name = "cypher1"
        lock_gulm {
                servers = [ "cypher-01.lab.msp.redhat.com" ]
        }
}

fence.ccs:
fence_devices {
        apc {
                agent = "fence_apc"
                ipaddr = "10.15.87.25"
                login = "apc"
                passwd = "apc"
        }
}

nodes.ccs:
nodes {
        cypher-01.lab.msp.redhat.com {
                ip_interfaces {
                        eth0 = "10.15.84.121"
                }
                fence {
                        power {
                                apc {
                                        port = 1
                                }
                        }
                }
        }
        cypher-03.lab.msp.redhat.com {
                ip_interfaces {
                        eth0 = "10.15.84.123"
                }
                fence {
                        power {
                                apc {
                                        port = 3
                                }
                        }
                }
        }
}

The file system is on a pool device with the following label:
poolname gfs2
subpools 1
subpool 0       0       1       gfs_data
pooldevice      0       0       /dev/sda2       0

The filesystem was created with:
gfs_mkfs -p lock_gulm -t cypher1:gfs2 -j2 /dev/pool/gfs2

It was mounted with:
mount -t gfs /dev/pool/gfs2 /mnt/gfs2

in the filesystem, I created two directories, cypher-01 & cypher-03

On cypher-01, in the /mnt/gfs2/cypher-01 directory, I ran:
iogen -f buffered -i 0 -m reverse -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwrevbuflarge | doio -avD
iogen -f buffered -i 0 -m random -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwranbuflarge -S 9746 | doio -avD

On cypher-03, in the /mnt/gfs2/cypher-03 directory, I ran:
iogen -f buffered -i 0 -m reverse -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwrevbuflarge | doio -avD
iogen -f buffered -i 0 -m random -s read,write,readv,writev -t 300000b -T
400000b 400000b:rwranbuflarge -S 9746 | doio -avD

I've been running for hours now, with no corruption.
Comment 4 Ben Marzinski 2005-05-19 12:57:03 EDT
It since you've said that this can happen without all 4 processes up and running,
it would be nice to see if there is some set of them that need to be running.
i.e.

Can you reliably hit this with a clean filesystem, and only one process running?
If so, which was it, reverse or random.

If not, it sounds like you should be able to hit this with only two processes.
Do they need to run on the same machine, or seperate machines?

If one or two processes can hit this running on the same machine, can you hit
this running lock_nolock? how about without running on top of a pool? It would
be nice to remove all the unnecessary components, so we know that the problem
isn't there.
Comment 5 Ben Marzinski 2005-05-23 16:20:26 EDT
This is not a GFS bug.  I can reliably hit this running on top of partitions. No
pool or filesystem.  I have also hit this running three seperate testing
programs, one of which is a stripped down program I wrote myself.  This is
pretty definitely not a problem with the test.  So far, I have only seen it
while using an MSA 1000 for my storage.  Also, I can't reproduce it without two
machines running the test (to seperate files or partitions, of course).  I can't
rule out it being storage related, but metadata is never corrupted, just data.
Also there are no SCSI errors that would lead me to believe that there is a
storage problem.

What I actually see is a hole.  I do a write with some pattern. Then I do
another overlapping write with a different pattern.  The write says that it
completed successfully, but there is a hole in the middle of the write, where
the new pattern is not written, and the old pattern is viewable.

Interestingly enough, when I'm writing to a GFS file system, with a blocksize of
4K, the hole is always located on a 4K boundary, and is 32 4K blocks big.
When I write directly to the partitions, the hole is always on a 1K boundary,
and is 32 1K blocks big.
Comment 6 Ben Marzinski 2005-05-23 16:26:27 EDT
Created attachment 114742 [details]
Test program to reproduce the bug

compile with
# gcc -o writer writer.c

To reproduce the bug, I've been running:
(on link-10)
writer /dev/sda1 400000b 300000b 400000b
writer /dev/sda2 400000b 300000b 400000b

(on link-12)
writer /dev/sda3 400000b 300000b 400000b
writer /dev/sda4 400000b 300000b 400000b

Usually 2 processes will eventually die (one on each node).
Comment 7 Ben Marzinski 2005-05-24 11:33:00 EDT
I have now seen this bug with only one machine running tests. It still takes
two processes running writer to cause the bug.
Comment 8 Ben Marzinski 2005-05-24 11:37:57 EDT
Actually, I take that last comment back. It still does need multiple machines to
reproduce
Comment 9 Kiersten (Kerri) Anderson 2005-05-24 14:45:06 EDT
Adding Tom to the CC list
Comment 10 Tom Coughlan 2005-05-24 15:03:38 EDT
What is the adapter type, and the storage type, and how is it hooked up (a FC
switch or multiple ports on the storage device)? or does it matter?
Comment 11 Dean Jansa 2005-05-24 15:30:12 EDT

QLogic QLA2300 PCI to Fibre Channel Host Adapter: bus 1 device 0 irq 17
Firmware version:  3.03.01, Driver version 7.01.01-RH1

Connected to a McData FC switch, which in turn the MSA100 is connected to
(via optical as well)
Comment 12 Tom Coughlan 2005-05-24 16:23:59 EDT
Is that MSA1000? Does it have dual controllers?

Each MSA controller has two host ports. How are these connected to the McData FC
switch?

I don't have any MSA storage, so I'll try with what I have...
Comment 13 Dean Jansa 2005-05-24 17:17:05 EDT
Single controller MSA.
(And it has only one host port, 2Gb SFP optical)
It is connected to the McData, zoned so the nodes involved can see that storage.


Comment 14 Tom Coughlan 2005-05-25 12:13:39 EDT
The test is running on shared storage. How long does it typically take to fail?

As an aside, while I was waiting for the RHEL 3 install to finish, I ran the
test on a single node with two adapters connected to the same storage (sda and
sdb are the same disk): 

./writer /dev/sda1 400000b 300000b 400000b
./writer /dev/sda2 400000b 300000b 400000b
./writer /dev/sda3 400000b 300000b 400000b

./writer /dev/sdb5 400000b 300000b 400000b
./writer /dev/sdb6 400000b 300000b 400000b

It ran for about 2 hours with no failure. 
Comment 15 Tom Coughlan 2005-05-25 13:27:13 EDT
It would be interesting to know whether you can reproduce the corruption if you
substitute /dev/raw devices for /dev/sd devices. Using raw eliminates the buffer
cache and VM effects from the picture. 
Comment 16 Tom Coughlan 2005-05-27 10:42:50 EDT
The test has been running for two days with no corruption. How long does it
usually take? 

I am running with one Qlogic and one Emulex HBA. Shall I switch to all QLogic?

Shall I run with more than two "writer"s per system?

Will you try the test with /dev/raw?

Here are the details on my configuration:

xeon1 and xeon2, each with 8GB, connected p-to-p to separate ports on a DotHill
storage box. Both HBAs are running at 2 Gbps.

Linux xeon1.lab.boston.redhat.com 2.4.21-32.ELsmp #1 SMP Fri Apr 15 21:17:59 EDT
2005 i686 i686 i386 GNU/Linux

scsi2 : QLogic QLA2300 PCI to Fibre Channel Host Adapter: bus 5 device 5 irq 15
        Firmware version:  3.03.01, Driver version 7.01.01-RH1

scsi(2): Topology - (N_Port-to-N_Port), Host Loop address 0x1
blk: queue f6edea18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
  Vendor: DotHill   Model: SANnetII          Rev: 327K
  Type:   Direct-Access                      ANSI SCSI revision: 03

Attached scsi disk sda at scsi2, channel 0, id 0, lun 0
SCSI device sda: 423014400 512-byte hdwr sectors (216583 MB)
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >

./writer /dev/sda1 400000b 300000b 400000b
./writer /dev/sda2 400000b 300000b 400000b

Linux xeon2.lab.boston.redhat.com 2.4.21-32.ELhugemem #1 SMP Fri Apr 15 21:04:31
EDT 2005 i686 i686 i386 GNU/Linux

Emulex LightPulse FC SCSI 7.1.14

scsi2 : Emulex LightPulse LP9002 2 Gigabit PCI Fibre Channel Adapter on PCI bus
05 device 30 irq 31
scsi3 : Emulex LightPulse LP8000 1 Gigabit PCI Fibre Channel Adapter on PCI bus
05 device 08 irq 20
blk: queue 39fbfa18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
  Vendor: DotHill   Model: SANnetII          Rev: 327K
  Type:   Direct-Access                      ANSI SCSI revision: 03
blk: queue 39fbf618, I/O limit 4294967295Mb (mask 0xffffffffffffffff)
Attached scsi disk sda at scsi2, channel 0, id 0, lun 0
SCSI device sda: 423014400 512-byte hdwr sectors (216583 MB)
Partition check:
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
blk: queue 39f3aa18, I/O limit 4294967295Mb (mask 0xffffffffffffffff)

./writer /dev/sda3 400000b 300000b 400000b
./writer /dev/sda5 400000b 300000b 400000b
Comment 17 Ben Marzinski 2005-05-27 11:42:47 EDT
I think that if I was going to see it, it always happened within 5 hours, 
usually in under 2 hours.

I was only ever able to reproduce this error on the setup that QA used.  I have
two machines that are basically the same hardware as the QA machines. I imaged
them with the same images, and loaded up the same software, and ran the tests
the same way, and never saw a thing.  The only difference was switch and
storage. I was using a Brocade and a Tornado. They were using a McData and a MSA.

The only odd thing is that when I was writing to files, it was only ever data
that got corrupted, not metadata. (Of course, the vast majority of what I was
writing was data, not metadata.) Also, QA said that they couldn't reproduce it
with RHEL4, or with other architectures (I believe writing to the same storage)
Comment 18 Stephen Tweedie 2005-05-27 11:49:10 EDT
Metadata is usually written in small chunks; it's only data that streams out to
disk in very large units.  If there's a problem with large transfers, then you
would indeed expect to see that manifest in data, not metadata, in general. 
(Though that's not a hard and fast guarantee: there are ways in which the
elevator can merge IOs that could lead to certain types of metadata ending up in
the middle of a large IO.)
Comment 19 Tom Coughlan 2005-05-27 11:53:56 EDT
Can someone start the test on the QA machines using /dev/raw for the long weekend?

And make sure that MSA has the latest firmware?
Comment 20 Corey Marthaler 2005-05-27 15:32:19 EDT
The writer test has been started to run over the weekend on the QA hardware.  
6 raw partitons link-10 writing to /dev/raw/100 - 103 and link-12 writing 
to /dev/raw/104 - 106. The raw devices are bound to /dev/sda1 - /dev/sdg1. 
 
root@link-10 root]# raw -qa 
/dev/raw/raw100:        bound to major 8, minor 1 
/dev/raw/raw101:        bound to major 8, minor 17 
/dev/raw/raw102:        bound to major 8, minor 33 
/dev/raw/raw103:        bound to major 8, minor 49 
/dev/raw/raw104:        bound to major 8, minor 65 
/dev/raw/raw105:        bound to major 8, minor 81 
/dev/raw/raw106:        bound to major 8, minor 97 
 
Comment 21 Corey Marthaler 2005-05-31 10:27:59 EDT
The above tests to the raw devices ran all weekend without any issues.
Comment 22 Tom Coughlan 2005-05-31 11:05:56 EDT
That suggests that the hardware is okay, or the problem is very specific to the
I/O pattern. 

My test (running against /dev/sda) continues to run without error. 

I'd like to know much memory these systems have. It would be best to post a
sysreport (the sysreport rpm comes with RHEL, just type "sysreport"). This will
capture all the info. we are likely to need in the future.
Comment 23 Corey Marthaler 2005-05-31 11:53:01 EDT
requested sysreport info is located in:
/home/msp/cmarthal/pub/bugs/157342
Comment 24 Corey Marthaler 2005-06-03 15:27:43 EDT
This week we ran with both the GFS-6.0.2.20-1 (2.4.21-32.EL) and the
GFS-6.0.2.20-2 (2.4.21-32.0.1.ELsmp) rpms, using the same MSA storage, with a
brocade and mcdata switch. We could not get the corruption to occur when using
the brocade switch with either kernel/GFS versions and could see it with both
kernel/GFS versions using the mcdata switch. We know now that the mcdata has to
be in the mix to allow this issue to happen, but since we were unable to see the
corruption when running to the raw devices, it doesn't look like it's dirrectly
the mcdata's fault.
Comment 25 Kiersten (Kerri) Anderson 2005-07-27 15:29:29 EDT
Moving this defect from GFS to the kernel list. We are able to show the problem
without GFS configured or used on the system.
Comment 27 Ben Marzinski 2005-08-08 14:36:41 EDT
For how we are reproducing it, see comment #6. As far as I know, no one has been
able to reproduce this bug without using the specific hardware on which it was
originally seen (see comments #17 and #24)
Comment 28 Corey Marthaler 2005-08-08 14:38:59 EDT
We were unable to see this issue while running the writer test on the raw device
(which proves it is not a hardware issue) but we were able to see this on the
block device (which means it is not GFS). 
Comment 29 Stephen Tweedie 2005-08-08 16:45:34 EDT
"We were unable to see this issue while running the writer test on the raw
device (which proves it is not a hardware issue)"

That does not prove that the hardware is good, unfortunately.  The raw device
will typically place *much* less load on the IO subsystem, as it serialises all
IOs.  Raw io performs no write coalescing and no IO pipelining.  

The "writer" program attached performs no synchronous IO, so on a buffered block
device it will queue up a deep disk queue of many outstanding writes at once. 
Using it on a raw device will implicitly synchronise the IO so that there is
only one IO outstanding at once. 

It is entirely possible that the hardware is having trouble with multiple
concurrent IOs but still works correctly on the much simpler, lighter load
conditions that raw IO produces.  The fact that the program works on raw does
not eliminate the possibility of a hardware fault.  And the fact that it only
fails on one particular switch still implies that hardware may be the root cause.
Comment 30 Ben Marzinski 2006-09-08 11:42:18 EDT
If we ever get a test set up for this again, and we can reproduce this let me
know. Or you can just close this bug for all I care.
Comment 31 Corey Marthaler 2006-09-11 11:42:24 EDT
Haven't seen this bug in well over a year, closing...

Note You need to log in before you can comment on or make changes to this bug.