Bug 56335 - 2.4.9-13smp: 3ware 7xxx, raid0 fails under bonnie++ (but is ok under 2.4.7-10smp)
Summary: 2.4.9-13smp: 3ware 7xxx, raid0 fails under bonnie++ (but is ok under 2.4.7-10...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-11-15 18:44 UTC by Matt Ryan
Modified: 2007-04-18 16:38 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-06-07 22:25:31 UTC
Embargoed:


Attachments (Terms of Use)

Description Matt Ryan 2001-11-15 18:44:43 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.13-0.5 i686)

Description of problem:
the summary pretty much says it all, I am seeing the problem on the two
machines described below, with any filesystem I've tried - ext3, reiser,
XFS (using the sgi-modified kernel kernel-smp-2.4.9-13SGI_XFS_PR1).  with
the 2.4.7-10 kernel, I can't reproduce.  excerpts of /var/log/messages in
'additional information'.

two intel stl2 dual PIII 1GB ram machines, one with a 3ware 7410, one a
7810, each flashed with the latest firmware (7.3.2), in a 3-drive raid0
configuration (one all maxtor drives, the other IBM, all 7200 rpm 40 GB).

bonnie++ version 1.02a

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.  bonnie++ -s 2g:64k -x 10 -d <path to 3ware partition>
2.
3.
	

Actual Results:  during the first or second bonnie++ iteration, will start
going south...

Additional info:

these are excerpted from the run with the XFS kernel/filesystem, however
the 3w-xxxx and SCSI related messages are virtually identical with the
normal 2.4.9-13 kernel and reiser or ext3.

details: 

Nov 15 12:01:15 d3 kernel: 3ware Storage Controller device driver for Linux
v1.02.00.008.
Nov 15 12:01:16 d3 kernel: scsi0 : Found a 3ware Storage Controller at
0x5490, IRQ: 21, P-chip: 1.3
Nov 15 12:01:16 d3 kernel: scsi0 : 3ware Storage Controller
Nov 15 12:01:16 d3 kernel:   Vendor: 3ware     Model: 3w-xxxx          
Rev: 1.0
Nov 15 12:01:16 d3 kernel:   Type:   Direct-Access                     
ANSI SCSI revision: 00
Nov 15 12:01:37 d3 kernel: Attached scsi disk sda at scsi0, channel 0, id
0, lun 0
Nov 15 12:01:37 d3 kernel: SCSI device sda: 241248385 512-byte hdwr sectors
(123519 MB)
Nov 15 12:01:37 d3 kernel:  sda: sda1
Nov 15 12:01:43 d3 kernel: XFS mounting filesystem sd(8,1)

<snip>

Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x131070e2).
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected
bits.
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x131060e2).
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected
bits.
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x131060e2).
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected
bits.
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x131060e2).
Nov 15 12:06:23 d3 kernel: 3w-xxxx: tw_post_command_packet(): Unexpected
bits.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Abort failed for
unknown Scsi_Cmnd 0xf6bfb400, re
setting card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x13173002).
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention
interrupt for card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x13173002).
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention
interrupt for card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x13173002).
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_drain_queue(): Unexpected bits.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): No attention
interrupt for card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_sequence(): Controller error
or no attention interrupt: givi
ng up for card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_reset_device_extension(): Reset
sequence failed for card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Reset failed for
card 0.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x13133002).
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_aen_read_queue(): Unexpected bits.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_interrupt(): Error reading aen
queue.
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_check_bits(): Found unexpected bits
(0x13107002).
Nov 15 12:07:24 d3 kernel: 3w-xxxx: tw_interrupt(): Unexpected bits.
Nov 15 12:07:25 d3 kernel: 3w-xxxx: tw_scsi_eh_abort(): Abort failed for
unknown Scsi_Cmnd 0xf6bfb600, re
setting card 0.

<snip, much repetition >

Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047088
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047096
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047216
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047344
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047472
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047600
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047728
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047856
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4047984
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048112
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048240
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048368
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048496
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048624
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048752
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4048880
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4049008
Nov 15 12:10:30 d3 kernel: : dev 08:01, sector 4098032
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098160
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098288
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098416
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098544
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098672
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098800
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4098928
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099056
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099184
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099312
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099440
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099568
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099696
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099824
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4099952
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100080
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100208
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100336
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100464
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100592
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100720
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100848
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4100976
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101104
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101232
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101360
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101488
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101616
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101744
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4101872
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102000
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102128
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102256
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102384
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102512
Nov 15 12:10:30 d3 kernel:  I/O error: dev 08:01, sector 4102640


<snip, much repetition >

Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046960
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046968
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046832
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046840
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046064
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046072
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4045936
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4045944
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284264
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284272
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284008
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284016
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284232
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284240
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284008
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046704
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046712
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046576
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046584
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046448
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046456
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046320
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 4046328
Nov 15 12:10:31 d3 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0
return code = 6000000
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284152
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284160
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 284024
Nov 15 12:10:31 d3 kernel:  I/O error: dev 08:01, sector 117440928
Nov 15 12:10:31 d3 kernel: I/O error in filesystem ("sd(8,1)") meta-data
dev 0x801 block 0x70001a0
Nov 15 12:10:31 d3 kernel:        ("xlog_iodone") error 5 buf count 1536
Nov 15 12:10:31 d3 kernel: xfs_force_shutdown(sd(8,1),0x2) called from line
939 of file xfs_log.c.  Return address = 0xc01bde3e
Nov 15 12:10:31 d3 kernel: Log I/O Error Detected.  Shutting down
filesystem: sd(8,1)
Nov 15 12:10:31 d3 kernel: Please umount the filesystem, and rectify the
problem(s)

Comment 1 Arjan van de Ven 2001-11-15 18:50:20 UTC
To get things clear:
This error shows up in the Red Hat kernel as well, not just in the SGI kernel ?

How much memory do you have ?

Comment 2 Matt Ryan 2001-11-15 19:09:25 UTC
yes, I am testing with the *redhat* 2.4.9-13smp (rh7.2 update) and 2.4.7-10smp
(rh7.2) kernels.  I use these kernels when using just reiser or ext3.  the XFS
kernels are those same exact RH kernels, patched and tested with XFS.

sorry to confuse with XFS, it was just the last set of error logs I had easy
access to (really, aside from fs-specific messages, the logs look the same every
time).

each machine has 1 GB ram (as is mentioned above).

Comment 3 Arjan van de Ven 2001-11-15 19:11:35 UTC
Ok can you try adding "mem=800M" to boot/grub and see if it still fails then ?

Comment 4 Matt Ryan 2001-11-15 19:59:51 UTC
ok, with 800M, it has now passed three bonnie++ iterations successfully, no
errors logged.  I don't think it ever got this far before.  (using 2.4.9-13 and
reiser right now).

Comment 5 Matt Ryan 2001-11-15 21:52:49 UTC
ok, nearly 20 iterations run on the two machines now with 800 mb, and no
problems.

Comment 6 Matt Ryan 2001-11-16 18:20:17 UTC
It turns out that the adaptec 2400A IDE raid card I am testing suffers from the
same problem - just harder to trigger.  that is, with the same exact hardware
and software configuration (except for the card), it fails occasionally, but not
when using just 800 mb ram. 

so, this is a highmem problem?  any prospects of a fix? :)

Comment 7 Gigs 2001-12-20 00:16:26 UTC
This could be a hardware error possibly.  I saw similar errors to the first part of your log on 
2.4.13smp with a supermicro P3TDE6 with dual PIIIs, serverworks chipset.  Adam Radford at 
3ware has said that the error is a "PCI reset" error.  I would email the log to linux.  
Switching to an AMD tigerMP fixed it for me, but was not an ideal solution.  The error also only 
happened for me when multiple 3ware cards were installed.

I have never seen errors similar to 
the second part of your log, could be because I didn't use XFS.  I don't know if this helps or 
confuses the issue, since reducing the RAM to 800 seemed to fix it.


Note You need to log in before you can comment on or make changes to this bug.